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Abstract 

Computing the general 3-D motion of a camera from the changes in its images as it 
moves through the environment is an important problem in autonomous navigation and 
in machine vision in general. The goal of this research has been to establish the complete 
design framework for a small, autonomous system with specialized analog and digital VLSI 
hardware for computing 3-D camera motion in real-time. Such a system would be suitable 
for mounting on mobile or remote platforms that cannot be tethered to a computer and for 
which the size, weight and power consumption of the components are critical factors. 

Combining algorithmic design with circuit design is essential for building a robust size- 
and power-efficient system, as there are constraints imposed both by technology and by 
the nature of the problem which can be more efficiently satisfied jointly. The first part of 
this thesis is thus devoted to the analysis and development of the algorithms used in the 
system and implemented by the special processors. Among the major theoretical results 
presented in Part I are the development of the multi-scale veto edge detection algorithm, the 
derivation of a simplified method for solving the motion equations, and a complete analysis 
of the effects of measurement errors on the reliability of the motion estimates. 

In the proposed system architecture, the first step is to determine point correspondences 
between two successive images. Two specialized processors, the first a CCD array edge de- 
tector implementing the multi-scale veto algorithm, and the second a mixed analog/digital 
binary block correlator, are proposed and designed for this task. A prototype CCD edge 
detector was fabricated through MOSIS, and based on the test results from this chip, im- 
provements are suggested so that a full-size focal-plane processor can be built. The design 
of the mixed analog/digital correlator is compared with a fully digital implementation and 
is seen to yield a significant reduction in silicon area without compromising operating speed. 
In the conclusions, the theoretical and experimental results from the different parts of this 
thesis are combined into a single design proposal for the complete motion system. 
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Chapter 1 



Introduction 



It is difficult to overstate the potential usefulness of an automated system to compute 
motion from visual information for numerous applications involving the control and naviga- 
tion of moving vehicles. Deducing 3-D motion by measuring the changes in sucessive images 
taken by a camera moving through the environment involves determining the perspective 
transformation between the coordinate systems defined by the camera's principal axes at 
its different locations. Solving this problem is important not only for motion estimation, 
but also for determining depth from binocular stereopairs. It is in fact equivalent to the 
classic problem in photogrammetry of relative orientation, for which methods were devel- 
oped by cartographers over a hundred years ago to measure the topography of large scale 
land masses [1], [2], [3], [4], [5]. 

Computing relative orientation involves two difficult subproblems. First, corresponding 
points must be identified in the two images, and second, the nonlinear equations defining the 
perspective transformation must be inverted to solve for the parameters of the motion. Not 
all image pairs allow an unambiguous determination of their relative orientation, however. 
For some configurations of points there is not a unique solution to the motion equations 
and for many others the problem is ill-conditioned [6], [7], [3], [4], [8]. The methods devel- 
oped long ago for cartography relied on considerable human intervention to overcome these 
difficulties. Large optical devices known as stereoplotters were invented to align match- 
ing features using a floating mark positioned by an operator, while general knowledge of 
the geometry of the scene and the camera positions was used to aid in solving the motion 
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equations 1 . 

Efforts to construct autonomous systems have also been limited by the complexity of the 
task. At present, machine vision algorithms for computing camera motion and alignment 
have reached a level of sophistication in which they can operate under special conditions in 
restricted environments. Among the systems which have been developed, however, there is a 
strong correlation between robustness and the amount of computational resources employed. 
Two approaches which are commonly taken are to either impose very restrictive conditions 
on the type and amount of relative motion allowed, in which case simple algorithms can be 
used to yield qualitatively correct results as long as the basic assumptions are not violated; 
or to relax the restrictions and therefore implement the system with complex algorithms 
that require powerful processors. 

The goal of this thesis is to go beyond these limitations and to design a system that is 
both unrestrictive and that uses minimal hardware. Specifically, the objectives of such a 
system are the following: 

• The complete system should be physically small and should operate with minimal 
power. This is particularly important if it is to be used in remote or inaccessible 
environments. 

• It should allow real-time, frame-rate operation. 

• It must be able to either produce an accurate estimate of the motion for the majority 
of situations it is likely to encounter or to recognize and report that a reliable estimate 
cannot be obtained from the given images. 

• The system should be self-contained, in the sense that neither external processing nor 
outside intervention is required to determine accurate estimates of the camera motion. 

A central tenet of this thesis is that in order to meet the first two requirements, spe- 
cialized VLSI processors, combining both analog and digital technology, are needed to 
perform specific tasks within the system. Clearly, meeting the last two requirements does 
not necessitate special hardware since they influence only the choice of the algorithms to 



Cartography is still a labor intensive process; although in the interest of developing geographic infor- 
mation systems (GIS), there have been many efforts in the last decade to automate mapping techniques 
by applying algorithms developed for machine vision ([9], see also the April f983 issue of Photogrammetric 
Engineering and Remote Sensing). 
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be implemented. Obviously, any algorithm which can be wired into a circuit can be pro- 
grammed on general purpose digital hardware. The motivation for designing specialized 
processors is the idea that in doing so a significant reduction in size and power consumption 
can be achieved over general purpose hardware. 

There are many aspects to a completely general system for computing camera motion 
and alignment, and it is necessary to define the limits of this study. Specifically, 

• Only image sequences from passive navigation will be examined. In other words, it 
is always assumed that the environment is static and all differences observed in the 
images are due to differences in camera position. The case of multiple independently 
moving objects in the scene will not be explicitly addressed. 

• The system design is based entirely on the problem of estimating motion from two 
frames only. Many researchers [10], [11], [12] have proposed the use of multiple frames 
in order to improve reliability, on the grounds that the results from two frames are 
overly sensitive to error and are numerically unstable. The philosophy of the present 
approach is that it is necessary to build a system which can extract the best results 
possible from two frames in order to make a multi-frame system even more reliable. 
Nothing in the design of the present system will prevent it from being used as a module 
within a more comprehensive multi-frame system. 

The goal of this thesis is not to build the complete motion system, but to develop 
the theory on which it is based, and to design the specialized processors needed for its 
operation. This thesis is divided into three major parts. Part I covers the theoretical issues 
of selecting and adapting the algorithms to be implemented by the special processors. It 
also includes a complete analysis of the numerical stability of the motion algorithm and 
of the sensitivity of its estimates to errors in the data. Parts II and III are concerned 
with the design of the processors needed for detemining point correspondences. One of the 
conclusions of Part I is that matching edges in the two images by binary block correlation 
is the most suitable method for implementation in VLSI. Part II describes a prototype edge 
detector built in CCD-CMOS technology which implements the multi-scale veto algorithm 
presented in Chapter 5, while Part III examines the benefits of combining analog and digital 
processing to design an area-efficient edge matching circuit. 



Part I 



Algorithms and Theory 
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Chapter 2 



Methods for Computing Motion and Structure 



Computing motion and structure from different views involves two operations: match- 
ing features in the different images, and solving the motion equations for the rigid body 
translation and rotation which best describes the observed displacements of brightness pat- 
terns in the image plane. Methods can be grouped into two categories according to whether 
features in the two images are matched explicitly or implicitly. Explicit methods generate 
a discrete set of feature correspondences and solve the motion equations using the known 
coordinates of the pairs of matched points in the set. Implicit methods formulate the motion 
equations in terms of the temporal and spatial derivatives of brightness and the incremental 
displacements in the image plane, or optical flow. In order to avoid explicit matching, these 
methods incorporate additional constraints, such as brightness constancy and smoothness 
of the optical flow, and derive the motion from a global optimization procedure. 

There are advantages and weaknesses to both approaches. Explicit methods require few 
assumptions other than rigid motion; however, they must first solve the difficult problem of 
finding an accurate set of point correspondences. In addition, although more of a concern 
for determining structure from binocular stereo than for computing motion, depth can only 
be recovered for points in the set. Implicit methods circumvent the correspondence problem 
but in exchange must make more restrictive assumptions on the environment. Since they are 
based on approximating brightness derivatives from sampled data, they are both sensitive 
to noise and sensor variation and susceptible to aliasing. 

In the interest of removing as many restrictions as possible, an explicit approach has 
been adopted for the present system. Details of the specific methods which will be used to 
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perform the tasks of finding point correspondences and solving the motion equations will 
be described in the following chapters. In this chapter, in order to situate this research 
with respect to related work, the basic methods for computing motion and alignment are 
presented in a more general context. I will first rederive the fundamental equations of 
perspective geometry, presenting the notation which is used throughout the thesis, and will 
then discuss several of the more significant algorithms which have been developed for both 
the explicit and the implicit approaches. Finally, I will review several previous and ongoing 
efforts to build systems in VLSI based on these different methods. 

2.1 Basic Equations 

In order to express the motion equations in terms of image plane coordinates, it is first 
necessary to formulate the relation between the 3-D coordinates of objects in the scene and 
their 2-D projections. Once the motion is known, the projection equations can be inverted 
to yield the 3-D coordinates of the features which were matched in the images. 

The exact projective relation for real imaging systems is in general nonlinear. However, 
it can usually be well approximated by a linear model. If greater precision is needed, 
nonlinearities can be accounted for either by adding higher order terms or by pre- warping 
the image plane coordinates to fit the linear model. Of the two choices most commonly used 
for the basic linear relation — orthographic and perspective — only perspective projection can 
meet the requirements of the present system. Orthographic projection, which approximates 
rays from the image plane to objects in the scene as parallel straight lines, is the limiting case 
of perspective projection as the field of view goes to zero or as the the distance to objects in 
the scene goes to infinity. Orthographic projection has often been used in machine vision for 
the recovery of structure and motion [13], [14] because it simplifies the motion equations. 
In the orthographic model the projected coordinates are independent of depth and are 
therefore uncoupled. For the same reason, however, it is impossible to uniquely determine 
motion from two orthographic views [15]. 

2.1.1 Perspective geometry 

Under the assumption of perfect perspective projection, such as would be obtained with 
an ideal pinhole camera, we define the camera coordinate system as shown in Figure 2-1 
with origin at the center of projection. The image plane is perpendicular to the z axis and is 
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Figure 2-1: Geometry of perspective projection. 



located at a distance /, which is the effective focal length, from the origin. In a real camera, 
of course, the image sensor is located behind the optical center of projection at z = —f. It 
is customary, however, to represent the image plane as shown in the diagram in order to 
avoid the use of negative coordinates. The point where the z, or optical, axis pierces the 
image plane is referred to as the principal point. 

Let p = (X, Y, Z) represent the position vector in the camera coordinate system of a 
point P in the scene and let p = (X/Z, Y/Z, 1) T denote the 2-dimensional homogeneous 
representation of p. Two world points P 8 - and Pj are projectively equivalent with respect 
to a plane perpendicular to the z- axis if and only if p i = p . For the world point P to 
be imaged on the plane z = f at P' , whose coordinates are (x,y,f), P and P' must be 
projectively equivalent. In other words 



x X i y Y 
— = — , and, — = — 
f Z' ' / Z 



(2.1) 



Since image irradiance, or brightness, is always sampled discretely by an array of pho- 
tosensors, it is convenient to define a secondary set of coordinates, (m x , m y ) on the array 
of picture cells such that the centers of each pixel are located at integer values of m x and 
vfiy. The vector m = (m x ,m y , 1) T is related by a linear transformation matrix K c to the 
2-D homogeneous representation, in the camera coordinate system, of all points which are 
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projectively equivalent to (x,y,f) 

m = K c p (2.2) 

Under the conditions illustrated in Figure 2-1, K c has the special form 



K r 



' f/s x m x0 ^ 

f/Sy lUyO 

V ° ° l ) 



(2.3) 



where / is the effective focal length; s x and s y are the physical distances, measured in the 
same units as /, between pixel centers along the orthogonal x- and y-axes; and (m x o, m y o) 
is the location, in pixel coordinates, of the principal point. The matrix K c is referred to 
as the internal camera calibration matrix and must be known before scene structure and 
camera motion can be recovered from the apparent motion of brightness patterns projected 
onto the image plane. 

In real devices, K c seldom has exactly the form of (2.3) due to factors such as the 
misalignment of the image sensor and spherical aberrations in the lens. Finding the appro- 
priate transformation is a difficult problem, and consequently numerous methods have been 
developed, involving varying degrees of complexity, to determine internal calibration [16], 
[17], [18], [19], [20], [21]. Discussing these methods, however, goes well beyond the scope of 
this thesis, and so it will be assumed for present purposes that the calibration is known. 

2.1.2 The epipolar constraint 

Suppose we have two images taken at two different camera positions, which we will refer 
to as right and left. Let p r and p; denote the position vectors with respect to the right 
and left coordinate systems to a fixed point P in the environment, p r = (X r ,Y r , Z r ) T and 
p; = (Xe,Ye, Zf) . Assuming a fixed environment so that rigid body motion is applicable, 
p r and p; are related by 

Pr = Rp, + b (2.4) 

where R denotes an orthonormal rotation matrix, and b is the baseline vector connecting 
the origins of the two systems. A necessary condition for the vectors p r and p; to intersect 
at P is that they be coplanar with the baseline, b, or equivalently, that the triple product 
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of the three vectors vanish 



p r • (b x Rp/) = 



(2.5) 



Given R, b, and the coordinates (x£, yi, /) of the point P[, which is the projection of P in 
the left image, equation (2.5) defines a line in the right image upon which the corresponding 
point P' r at (x r ,y r ,f) must lie. This line, known as the epipolar line, is the intersection of 
the image plane with the plane containing the point P and the baseline b. The position of 
P' r on the epipolar line is determined by the depth, or ^-coordinate, of P in the left camera 
system. To see this, define the variables £, tj, and (f> to represent the components of the 
rotated homogeneous vector Rp^ 



' ^ 



Rp^ 



v 

\^ ) 

Then equation (2.4) can be expressed in component form as 



(2.6) 



1 X, \ 

Z 



Z, 



r, 
\* ) 



t K \ 



(2.7) 



\ Z r / \ <(> J \°* J 

The projection of P onto the right image is found from 

. Zii + b x 



and 



Vr 



rX r 



Zj t 



f 



Zi4> + b z 



f 



Zfji + b y 



(2.8) 



(2.9) 



Zi4> + b z 

P' r thus varies along the epipolar line between f(b x /b z ,b y /b z ) when Zi = to f(£/(f>,T]/(f>) 
when Zi = oo. The first point, known as the epipole, is independent of X£ and yg, and is 
therefore common to all epipolar lines. By rewriting (2.4) as 



Pi 



R T p r 



b' 



(2.10) 



where b' = — R b, a similar relation can be obtained for the coordinates of the point P' f , the 
projection of P onto the left image plane, in terms of Z r and the components of the rotated 
vector R p r . The geometry of the epipolar transformation is illustrated in Figure 2-2. 
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Figure 2-2: Epipolar geometry. 

Binocular stereopsis is based on the fact that the depth of objects in the scene can be 
recovered from their projections in two images with a known relative orientation. From 
(2.8) and (2.9) it is easily seen that 



Z, 



fb x - b z x r fb y - b z y r 



cf)x r - /£ cf)y r - frj 



f2.11) 



Fhe quantity (f>x r — /£ is known as the horizontal disparity, djj, and the quantity 4>y r — frj 
as the vertical disparity, dy. If R = I and b = (|b|, 0, 0), then £ = xg/ f, (f> = 1 and equation 
(2.11) reduces to the familiar parallel geometry case in which 



/|b| 



(2.12) 



and for which the vertical disparity is necessarily zero. 



2.2 Computing Motion by Matching Features 

In this section, we will examine only those methods which compute motion from point 
correspondences. Although algorithms using higher level features, such as lines and planes 
have been proposed ([22], [23], [24], [25]), these usually require more than two views and 
also are not as practical for hardware implementation. 

Fo compute camera motion, or relative orientation, we need to find the rotation and 
baseline vector that best describe the transformation between the right and left camera 
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systems. We assume that we have a set of N pairs, {(r 8 ,£ 8 )}, i = 1, . . . , N, where r 8 - and £j 
are the position vectors of the points P' r . and P[. which are projectively equivalent, in the 
right and left systems respectively, to the same world point P. 

The fundamental equation for computing the coordinate transformation between the two 
camera systems is the coplanarity constraint given by equation (2.5). Since this equation is 
homogeneous, it can be written in terms of r 8 - and £j as 

r t - • (b X R4) = (2.13) 

Equation (2.13) is unaffected by the lengths of any of the vectors r 8 -, £j, or b. We usually 
set |b| = 1 so that the baseline length becomes the unit of measure for all distances in the 
scene. The vectors r 8 - and £j may be set equal to the homogeneous vectors p ri and pi i 

r; = p ri and, £ t = p t% (2.14) 

or may also be assigned unit length: 

r t . = J|± and, l { = ^- (2.15) 

IPril \Pti\ 

The second choice is often referred to as spherical projection. 

2.2.1 Representing rotation 

There are several ways to represent rotation, including orthonormal matrices as in (2.13). 
The matrix form is not always the best choice, however, as it requires nine coefficients, even 
though there are only three degrees of freedom. A rotation is completely specified by the 
pair (#,u>), where represents the angle of rotation about the axis Q> = (u> x , u y , u z ) , with 
|u>| = 1. The relation between the orthonormal matrix R and (0,Q) is given by 



R 



' cos 6 + w^(l — cos 6) u) x u) y (l — cos 6) — ui z sin 6 ui x ui z (l — cos 6) + ui y sin 
u> x u>y(l — cos 6) + oj z sin 6 cosd + w?(l — cos 6) u> y u> z (l — cos 6) — oj x sin 
\ u> x u> z (l — cos 6) — bj y sin 6 u> y u> z (l — cos 6) + oj x sm6 cosO + oj 2 z {1 — cos 6) J 



(2.16) 
which can also be expressed as 

R = cos 01 + (1 - cos 0)uu T + sin 0SI X (2.17) 
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where 



/ 



\ 



J y 



(2.18) 



— u z (jj„ 
fix = uj z 

Equation (2.17) leads directly to Rodrigues' well-known formula for the rotation of a vector 

£ 



TL£ = cos#£ + (1 - cos6»)(d> • £)d> + sin 6»d> x £ 
= £ + sin 0u> x £ + (1 - cos 0)u> x (u> X £) 



(2.19) 



A frequently useful and compact representation of rotation is the unit quaternion. 
Quaternions are vectors in H 4 which may be thought of as the composition of a scalar 
and 'vector' part [4]. 

a=(a ,a) (2.20) 

where a = (a x ,a y ,a z ) is a vector in H 3 . An ordinary vector v in H 3 is represented in 
quaternion form as 



v = (0,v) 



(2.21) 



A unit quaternion is one whose magnitude, defined as the square root of its dot product 
with itself, is unity. 

q-q=l (2.22) 

Unlike vectors in IR 3 , quaternions are endowed with special operations of multiplication and 
conjugation, and thus form the basis of a complete algebra. The fundamental operations 
and identities of quaternion algebra are summarized in Appendix A. 

The usefulness of quaternions lies in the simplicity with which rotation about an arbi- 
trary axis can be represented. Every unit quaternion may be written as 



(2.23) 



q = cos — , lc sin — 
4 [ 2 ' 2 



and the rotation of the vector v by an angle 9 about u> by 



v = qvq 



(2.24) 
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where q* represents the conjugate of q. 

In the following discussions I will alternate between these different representations, to 
use whichever form is best suited to the problem at hand. 

2.2.2 Exact solution of the motion equations 

There are five unknown parameters in equation (2.13), two for the direction of the 
baseline and three for the rotation. It has long been known that relative orientation can 
be determined from a minimum of five points, as long as these do not lie on a degenerate 
surface. Due to the rotational component, however, the equations are nonlinear and must 
be solved by iterative methods. Furthermore, the five-point formulation admits multiple 
solutions 1 [7], [4]. 

Thompson [26] first showed how the coplanarity conditions could be formulated as a set 
of nine homogeneous linear equations, and Longuet-Higgins [27] proposed an algorithm to 
derive the baseline vector and rotation matrix from the solution to the equations obtained 
from eight point correspondences. This algorithm is summarized as follows: 
The first step is to rewrite the coplanarity constraint (2.13) as 

r; • (b x Rli) = -R£ 8 • (b x r 8 ) 

= -£jK T B x ri (2.25) 



where 



B, 



/ ~b Z by ^ 

b z -b x 
\ -by b x 



(2.26) 



We define the matrix E as 



E = R T B X (2.27) 



1 Faugeras and Maybank [7] first proved that there are at most 10 solutions for the camera motion given 
5 correspondences, thereby correcting a longstanding error by Kruppa [5] who had thought there were at 
most 11. 
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and order the components of E as 



E 



' e\ e4 e7 ' 

e2 e 5 e 8 
y e 3 e 6 e 9 y 



(2.28) 



Let a 8 denote the 9x1 vector formed from the products of the components of r 8 - and £j 



/ 


r •/ ■ 

r ■( ■ 

1 zt^yt 


\ 


V 


r ■( ■ 

1 Zl^Zl 


J 



C2.29) 



and let e denote the 9x1 vector of the elements of E. Then the coplanarity constraint for 
each pair of rays results in an equation of the form 



ai T e = 



(2.30) 



Eight correspondences result in eight equations which can be solved to within a scale fac- 
tor for the elements of E. Given E, the baseline vector is identified as the eigenvector 
corresponding to the zero eigenvalue of E E 



E T Eb = 



(2.31) 



as can be seen from the fact that 



E T E 



By B) 



bb J 



(2.32) 
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The rotation matrix R is found from 

R = B X E T - Cof(E T ) T (2.33) 

where Cof(E ) is the matrix of cofactors of E . 

2.2.3 Least-squares methods 

If there is no error in the data, the Longuet-Higgins algorithm will give a unique solution 
for the motion 2 except for certain configurations of points which lie on special surfaces [8], 
[28]. It is extremely difficult, however, to obtain error-free data, particularly if the corre- 
spondences are determined by an automatic procedure. It turns out that the 8-point linear 
algorithm is extremely unstable in the presence of noise, due largely to the fact that the 
equations (2.30) do not take into account dependencies between the elements of E, and 
hence their solution cannot be decomposed into the product form of equation (2.27). 

Even when nonlinear methods are used to solve the coplanarity constraint equations, 
the solution is very sensitive to noise when few correspondences are used [29]. With error 
in the data, the ray pairs are not exactly coplanar and equation (2.13) should be written as 

R£ t ■ ( Ti x b) = A,- (2.34) 

Instead of trying to solve the constraint equations exactly, it is better to find the solution 

that minimizes the error norm 

N 

S = J2 X 1 ( 2 - 35 ) 

8 = 1 

A somewhat improved approach over the 8-point algorithm was proposed by Weng et 
al. [30] based on a modification of a method originally presented by Tsai and Huang [28]. 



There is an intrinsic fourfold ambiguity to every solution; however, these are all counted as one. This 
ambiguity will be discussed in more detail in Chapter 7. 
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They defined an N X 9 matrix A as 



/ ,T \ 



(2.36) 



V a JV / 



such that S = |Ae| 2 . The vector e which minimizes S is the eigenvector of A A with 
the smallest eigenvalue. The baseiine direction and rotation are derived from the resulting 
matrix E such that b minimizes |E Eb| and R is the orthonormal rotation matrix that 
minimizes |E — R B x |. 

This method, however, also neglects the dependencies between the elements of E, and 
consequently is still very sensitive to errors in the data. The matrix formed from the 
elements of the vector e that minimizes | Ae| is not necessarily close to the product of the 
matrices R and B x which correspond to the true motion. 

Several researchers have pointed out the problems of computing motion by unconstrained 
minimization of the error [12], [3]. Horn [3] proposed the most general algorithm to solve 
the direct nonlinear constrained optimization problem iteratively. This method was later 
revised and reformulated in [4] using unit quaternions. 

The vectors r 8 -, £j, and b are given in quaternion form by 



(0,r), 



(0,£), and, b = (0,b) 



(2.37) 



while that of £'i = R£ 8 is given by 



t'i = ¥A* 



(2.38) 



Using the identity (A. 10) given in Appendix A, the triple product A 8 - (2.34) can be 
written as 



A; = f ,-b • q£ t q* 
= f ,-bq • q ti 
= f ,-d • q ti 



(2.39) 



where d = bq. 
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In the latter version of Horn's algorithm, S , the sum of squared errors, is written as a 
first order perturbation about a given d and q. The idea is to find the incremental changes 
8d and Sq which minimize the linearized equation subject to the constraints 



q • q = 1, d • d = 1, and, q • d = 



(2.40) 



The updated vectors q+ Sq and d + 8d must also satisfy these conditions and, neglecting 
second order terms, this results in the incremental constraints 



q • Sq = 0, d • Sd = 0, and, q • Sd + d • Sq = 



(2.41) 



Differentiating the constrained objective function with respect to Sq, Sd, and the Lagrange 
multipliers A, fj,, v associated with each of the constraints (2.41) and setting the result to 
zero results in a linear system of equations of the form 

( Sq\ 

Sd 

A = h (2.42) 

H 
\ " I 

where the matrix J and the vector h are both known, given the current value of q and d 
(see [4] for details). Equation (2.42) can thus be solved for the 11 unknowns, which are the 
four components each of Sq and Sd and the three Lagrange multipliers. After updating q and 
d with the new increments Sq and Sd, the procedure can be repeated until the percentage 
change in the total error falls below some limit. This algorithm has been shown to be very 
accurate and efficient in most cases for estimating motion, even with noisy correspondence 
data, as long as there are a sufficiently large number of matches. As presented, however, it 
is too complex to be implemented efficiently on a simple processor, given the need to solve 
an 11x11 system of equations at each iteration. We will present a simplified adaptation of 
this algorithm in Chapter 7. 
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2.3 Correspondenceless Methods 

If the displacements in the image are small, they can be approximated by the time 
derivatives of the position vectors to points in the scene. For small 9, cos9 ~ 1, sin 9^9, 
and equation (2.17) reduces to 

R^I + 6»fi x (2.43) 

The equation of rigid body motion (2.4) then becomes 



Rp, + b 

pi + 6{Q> xpi)|b 



(2.44) 



Let p denote the time derivative of the vector p, which can be approximated as p = p; — p r . 
From (2.44) 

p = -0{u> X p) - b (2.45) 

which, expanded into component form, results in 



Y 

\ Z J 



I 9(u z Y-u y Z)-b x ^ 
0(u x Z - u?X) - b v 



(LO x Y - LOyX) 



(2.46) 



Image plane displacements are given by 



dx dy 

u = — = xt - x r , and, v = — = y t - y r 



(2.47) 



From the equations of perspective projection (2.1) we have 



at at \Z J Z \ Z 



(2.48) 



and 



dt Z V Z 



(2.49) 



Combining (2.46) through (2.49), we obtain the equations for the incremental optical flow 

-fh + b z x 



Z 



+ - [cj x xy - cj y (x +f)+u z yf 



2.50) 
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-fb y + b z y 



Z f 



(ujyxy -Lo x (y 2 + f 2 ) + oj z xfj (2.51) 



first derived by Longuet-Higgins and Prazdny [31]. 

If it is assumed that the brightness £ of a point in the image does not change as the 
point moves, then the total derivative of brightness with time must be zero, that is, 

dE _ dEdx dEdy dE 

dt dx dt dy dt dt 

= E x u + E y v + E t (2.52) 

Equation (2.52) is known as the brightness change constraint equation. 

There are two approaches to using these equations. The first, and earliest proposed, is 
to compute the optical flow over the entire image and to invert (2.50) and (2.51) to find the 
global motion and depth at each pixel. The second, known as the direct approach, skips the 
computation of the optical flow and uses only the constant brightness assumption combined 
with the incremental rigid body equations. Neither approach requires finding explicit point 
correspondences . 

2.3.1 Optical flow 

Horn and Schunck developed the first algorithm for determining optical flow from local 
image brightness derivatives [32] based on minimizing the error in the brightness change 
constraint equation 

e h = E x u + E y v + E t (2.53) 

Since there are two unknowns at each pixel, the constant brightness assumption is not 
sufficient to determine u and v uniquely and a second constraint is required. Horn and 
Schunck chose the smoothness of the optical flow and added a second error term 

e 2 s = \Vu\ 2 + \Vv\ 2 (2.54) 

The total error to be minimized is therefore 

(4 + ^4) dxd V ( 2 -55) 
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where A 2 is a penalty term that weights the relative importance of the two constraints. 
The functions u and v which minimize e 2 ot for a given A can be found using the calculus of 
variations. 

One problem with computing optical flow by applying a smoothness constraint is that 
the flow is not smooth at boundaries between objects at different depths. The global 
optimization procedure causes errors generated at depth discontinuities to propagate to 
neighboring regions [33]. Segmenting the optical flow at depth discontinuities would appear 
to be the solution to this problem except that one does not know a priori where they are. 
Murray and Buxton [34] proposed incorporating discontinuities by adding line processes 
to the objective function, using an idea originated by Geman and Geman for segmenting 
gray-scale images by simulated annealing [35]. The resulting optimization problem is non- 
convex, however, and requires special procedures to converge to a global minimum energy 
state. 

Once the optical flow is determined it is necessary to solve equations (2.50) and (2.51) to 
find motion and depth. As was the case for the explicit methods, absolute distances cannot 
be recovered since scaling Z and b by the same factor has no effect on u and v. Longuet- 
Higgins and Prazdny [31] showed how motion and depth parameters could be determined 
from the first and second derivatives of the optical flow after first computing the location 
of the epipole. Heeger et al. [36] proposed a method to recover the motion by applying 
rotation insensitive center-surround operators that allow the translational and rotational 
components of the motion to be determined separately. Ambiguities in interpreting the 
optical flow in the case of special surfaces have been analyzed in [37], [38], [39], and [40]. 

2.3.2 Direct methods 

The method of Horn and Schunck, or one of its variations, requires a great deal of 
computation to determine the optical flow — which is only an intermediate step in obtain- 
ing the actual parameters of interest. The direct approach of Horn and Weldon [41] and 
Negahdaripour and Horn [42] avoids computing optical flow by substituting u and v from 
equations (2.50) and (2.51) directly into (2.52). The brightness change constraint equation 
is thus expressed as 

(v • Q)0 + ^— = -E t (2.56) 



CHAPTER 2. METHODS FOR COMPUTING MOTION AND STRUCTURE 



40 



where 



and 
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x - x(xE x + yE y ) If 



(2.57) 



V 



yE x 



sE„ 



(2.58) 



/ 



Note that the vectors s and v are entirely computable from measurements in the image. 

Assuming the image contains N pixels, there are N + 5 unknowns in equation (2.56): 
the five independent parameters of b and Q>9 (recall that b is a unit vector due to the scale 
factor ambiguity), and the N depth values Z. Since there is only one equation (2.56) for each 
pixel, the problem is mildly underconstrained. Given two images it can be solved only for a 
few special cases in which either the motion or the surface structure is restricted. With more 
than two views of the same scene, however, the problem is no longer underconstrained [11]. 
It should be noted that it is never required to incorporate the assumption that the optical 
flow is smooth, and hence the problems associated with discontinuities in the flow are 
avoided. 

Several methods have been developed to solve the special cases where the problem is not 
underconstrained for two views. Three of these were developed by Negahdaripour and Horn 
who gave a closed form solution for motion with respect to a planar surface [43]; showed how 
the constraint that depth must be positive could be used to recover translational motion 
when the rotation is zero, or is known [44]; and derived a method for locating the focus of 
expansion [45]. Taalebinezhaad [46], [47] showed how motion and depth could be determined 
in the general case by fixating on a single point in the image. He essentially demonstrated 
that obtaining one point correspondence would provide enough information to enable the 
general problem to be solved. 



2.3.3 Limitations of correspondenceless methods 

Methods for computing motion and depth from the local spatio-temporal derivatives 
of image brightness must rely on specific assumptions in order to work. The most impor- 
tant of these, on which all of the methods just described are based, is that brightness is 
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constant (2.52). Verri and Poggio [48] criticized differential methods on the grounds that 
brightness constancy is often violated. Their arguments, however, were based on consid- 
ering shading effects which are important only for specular surfaces or when the motion 
is large enough to significantly affect surface orientation. Furthermore, these effects domi- 
nate only when the magnitude of the spatial brightness gradient is small. There are clearly 
cases, such as the rotating uniform sphere or the moving point light source, as pointed out 
by Horn [49] and others, in which the optical flow and the motion field are different. In 
areas of the image where the brightness derivatives are small, it is difficult to constrain the 
motion or to determine depth. However, this problem is not specific to differential methods. 
Gennert and Negahdaripour [50] investigated the use of a linear transformation model to 
account for brightness changes due to shading effects on lightly textured surfaces. Their 
method was applied only to computing optical flow and involved modifying the objective 
function (2.55) to add new constraints. Direct methods do not lend themselves as easily 
to relaxing the brightness constancy assumption. With these it is simpler to ignore areas 
where the spatial derivatives are small. 

One of the more important assumptions underlying differential methods is that the 
interframe motion must be small so that the approximations (2.43)-(2.45) will be valid, 
and so that the spatial and temporal sampling rates will not violate the Nyquist criterion. 

It is useful to perform some sample calculations to see what is meant by "small". The 
approximations s'm9 ~ 9, cos9 ~ 1 are accurate to within 1.5% to about 10° of rotation. 
Approximations (2.45) and (2.47) which express the derivatives of the position vector as 
the difference between the left and right rays, and which incorporate the approximation 
R ~ I + 9tt x , are thus reasonable as long as the velocity of the point in the scene is 
constant between frames and 9 < 10°. These conditions should not be difficult to achieve 
with video-rate motion sequences. The angular restriction may rule out some binocular 
stereo arrangements, however. 

The primary concern is thus not the validity of the incremental optical flow equations, 
but whether the sampling rates are high enough to avoid aliasing. The Nyquist criterion 
which bounds the maximum rate at which an image sequence can be sampled in space and 
time can be derived as follows. 

The constant brightness assumption requires that 

E x u + E y v + E t = (2.59) 
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If the Fourier transform of E(x, y, t) is given by 

E(x,y,t)^£(£,ri,u) (2.60) 

then 

E x u + E y v + E t ^ j{iu + V v + u)£(£, J?, v) (2.61) 

If the constant brightness assumption is valid, then by linearity of the Fourier transform, 
either 

j(£u + r]v + v) = (2.62) 

or, 

\£(€,r},")\ = (2-63) 

for all £, tj, v. 

If E(x, y) is bandlimited so that |£| < Cl x and \tj\ < Cl y , then (2.62) requires that \v\ < Cl t 
where 

n t = n x \u\ + n y \v\ (2.64) 

Note that Cl x , Cl y , and Cl t represent angular frequencies and should not be confused with 
the matrix fi x . 

To avoid aliasing, the temporal sampling rate r must satisfy r > 2i7j. However, r often 
cannot be changed, for instance in video sequences where images are produced at a rate 
of 30 frames/sec. 3 Although it is possible to design video cameras to operate at higher 
rates, other factors, such as the amount of available light or interframe processing time 
requirements, may limit how far one can go. 

For a given r, the Nyquist criterion thus imposes a restriction on the maximum im- 
age plane displacement which can be tolerated. If the spatial bandwidths Cl x and Cl y are 
approximately the same, so that we can set Cl x = Cl y = O s , then 

t > 2tt t = 2tt s (\u\ + \v\) max (2.65) 



3 According to the American NTSC standard. Other countries outside North America and Japan use the 
PAL and SECAM standards which produce 25 frames/sec. 
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or, 

(M + M) max <^- (2.66) 

The quantities u/t and v/t have the units of pixels /frame, while S7 S has units of 1/pixel. 
Since the images are spatially sampled, it is also true that < |i7 s | < 1/2. 

If the image is highly textured and contains significant energy in frequencies near the 
upper limit, the maximum tolerable displacement will be around 1 to 2 pixels per frame. In 
most cases, disparities in binocular stereo pairs will be greater than this. It is worthwhile to 
compute some typical displacements in images generated by a moving camera. From (2.50) 
and (2.51) we can find the optical flow for the following special cases: 

1. Pure translation along the x direction 

(«,«) = -|(/,0) (2.67) 



2. Pure translation along the z direction 



(u,v)=^(x,y) (2.68) 

3. Pure rotation of 9 about Q> = y 

(u,v)=j(x 2 + f 2 ,xy) (2.69) 

In normal imaging systems the effective focal length / is several hundred times longer than 

the interpixel spacing. For the purpose of calculating displacements, let / = 200 pixels. For 

the first case, pure translation in the x direction, suppose the camera is moving at 30mph 

(48 km/hr) and viewing an object at a distance of 10m while generating a video sequence 

at 30 frames/sec. This could be the situation of a camera attached to a car door viewing 

the side of the road. In 1/30 sec, the camera has moved .444m in the x direction. We thus 

find 

.444 
(u,v)= - — (200,0) = (-8.9,0) (2.70) 

which is considerably larger than the maximum allowed displacement. 

In the second case, pure translation along the z direction, we can identify the quantity 
b z /Z as 1/T, where T is the time-to-impact of the object being viewed. Suppose T = 10 
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sec and the frame rate is still 1/30 sec, then 

(u,v)=—(x,y) (2.71) 

v , ) 30Q v ,yj v ) 

For most, or all, of the image, i.e., x,y < 300, the displacement will be less than 1 pixel, 
and hence there should be no problem in applying differential methods to compute the time 
to crash. (For T < 10 sec, the displacements may become too large, but at that point it 
may be too late to care.) 

In the last case, pure rotation about y, the motion depends only on 9 and the position 
in the image. The smallest displacement occurs at the principal point, x = 0, y = 0, where 

(u,v) = 9(f,0) (2.72) 

Every 1° (.0175 radians) of rotation corresponds to a displacement of 3.5 pixels, with / = 200 
as before. At 1/30 sec, this corresponds to 30° per second, which is easily exceeded by 
ordinary vibrations from moving the camera. 

When the displacements are too large for the given frame rate and sensor dimensions, 
the situation can be remedied by low-pass filtering the image, or equivalently, by reducing 
the spatial sampling rate. Rewriting (2.66) as 

Ms < „ , T , ,, (2.73) 

2(M + M) max v ; 

gives the maximum bandwidth for a given sampling rate and maximum image plane dis- 
placement. 

Unfortunately, it is not possible to know in advance what (|m| + \v |) max will be. If 
the resolution is set lower than necessary, useful information is lost, while if it is set too 
high, aliasing will occur. More insidiously, one cannot determine from the local brightness 
gradients alone if aliasing has occurred. One solution proposed by Anandan [51] is to 
perform motion estimation at multiple scales by separating the image into a hierarchical 
pyramid structure in which each level represents a different spatial bandwidth and sampling 
rate. Information can thus propagate from coarse to fine levels to determine the highest 
resolution at which optical flow can be computed from brightness gradients. 

The pyramid structure is not the most practical option for hardware implementation 
as it involves a great deal of processing, and there is not a simple alternative for finding 
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the appropriate filter bandwidth using only gradient information. By combining explicit 
matching and differential methods, however, it should be possible to devise a reliable sys- 
tem which takes advantage of the best of each. Explicit matching methods can determine 
the maximum image displacements and compute the motion parameters, while differen- 
tial methods can more easily obtain information that relates motion and structure from 
all parts of the image. For example, given the motion and assuming the image has been 
appropriately lowpass-filtered, depth can be computed from equation (2.56) as 

s • b . 

Z = -- - ^- (2.74) 

In this thesis we are primarily concerned with the design of a system to compute relative 
motion from explicit point correspondences and will not explore further the benefits, or 
the details, of interfacing to other systems based on different approaches. It should be 
understood, however, that the choice of an explicit strategy does not rule out its use in 
conjunction with other methods. 

2.4 Motion Vision and VLSI 

The tremendous computational complexity of many of the algorithms for determining 
motion and the need to perform these computations in real time has led to the design of 
several specialized VLSI systems. Analog processing has been a major component in most 
of these as it offers the posibility of performing parallel operations on large amounts of 
data with compact, low-power circuits. In [52], Horn presents the theory and gives several 
examples of useful computations which can be performed by analog networks. 

One of the first circuits was the correlating motion detector of Tanner and Mead [53] 
which was a simple 1-D detection circuit that allowed a maximum motion of ±1 pixel. 
A linear array of photodiodes converted incident light to a 1-bit signal which was com- 
pared, via a binary correlation circuit, to the stored signals from the previous cycle. The 
peak correlation value at each pixel was detected by mutual inhibition among neighboring 
comparators, and the output was summed on a global bus. 

A later design by Tanner implemented a 2-D non-clocked array of photosensors to com- 
pute optical flow by gradient descent on a feedback network [54], [55, Chapter 14]. This 
system was limited to constant flow, as would arise from a pure translation parallel to the 
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image plane. In this case the error function (2.55) to be minimized reduces to 

e 2 tot = [ [ (E x u + E y v + E t f dxdy (2.75) 



Since the flow is constant, the problem is solved by taking derivatives with respect to u and 
v and setting these to zero. We find that 

-^ = j j (E x u + E y v + E t ) E x dxdy (2.76) 

de 



2 j- j- 

-^ = J J (E x u + E y v + E t ) E x dxdy (2.77) 

In the gradient descent approach, currents proportional to quantities in the integrals are 
fed into a negative feedback loop which drives the variable voltages representing u and v to 
values which force the derivatives to zero. This circuit was designed to operate in continuous 
time to avoid temporal aliasing. The first chip built was an 8x8 array with processors at 
each pixel to compute the local multiplications and two global busses to carry the values of 
u and v. 

Other circuits developed by the Computation and Neural Systems Program group at 
CalTech are described by Horiuchi et al. in [56] where they discuss a comparative study 
of four experimental designs for 1-D motion estimation. Among these were a 1-D version 
of Tanner's gradient descent optical flow chip and a fully digital circuit composed of off- 
the-shelf components to implement correlation. The other two designs were a pulse-coded 
correlation circuit (based on a model of structures found in the auditory system of owls) 
which detects time differences between neighboring pulses, and a mixed analog/digital sys- 
tem to track zero-crossings of a difference of Gaussians (DOG) filtered image. In their 
results, they report that the fully digital circuit, composed of a Fairchild Linear CCD 256 
pixel array and a Harris RTX2001A microprocessor, had the best performance in overall 
robustness, while the Tanner 1-D optical flow chip had the least reliable performance. They 
also reported difficulties using gradient methods due to the 120Hz flicker found in ordinary 
room lights. 

There has been a great deal of interest, motivated by the desire to reduce interchip 
communication requirements, in developing one-chip circuits that incorporate photosensing 
and local processing at each pixel [57]. With focal-plane processing, however, the area taken 
up by the processing circuitry increases pixel size and therby reduces the maximum array 
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size which can be placed on the chip. Since technology limitations restrict the maximum 
die size to about 1cm 2 , either resolution or field of view must be sacrificed. 

Gottardi and Yang [58] recently reported the development of a single chip 1-D motion 
sensor in CCD/CMOS technology with a 115-pixel linear image sensor and CCD charge 
subtraction circuits to perform correlation. McQuirk is currently working on a one-chip 
design for a focus of expansion (FOE) detector using the direct approach developed by 
Negahdaripour and Horn [45]. The system architecture and results of a preliminary test chip 
are reported in [59]. In order to obtain a reasonable array size (64x64 in 2/i technology), 
McQuirk chose not to implement a fully parallel processor array, but to time- multiplex 
the computation using one processor per column. Instead of the continuous-time gradient 
descent method performed by Tanner's optical flow chip, this system computes a discrete- 
time iterative approximation to minimize the associated error function. Results from the 
final design of the complete 64x64 array chip are not yet available. 

The common feature of the above systems is that they deal with only a very limited 
aspect of the problem. Most assume constant optical flow, and none allow for rotation. 
Given current technology limitations and the complexity of computing general motion, it is 
probably safe to conclude that it cannot be done with a single chip design at any time in 
the forseeable future. One reason for designing simpler subsystems is so that they can be 
combined to solve more complex problems. As yet, however, no one has built or proposed a 
complete system which includes the design of specialized processors for computing general 
motion, or relative orientation, in unrestricted environments. 

This is the problem which is addressed in this thesis. 



Chapter 3 



Matching Points in Images 



Having chosen to build the system based on an explicit matching approach, we must now 
determine the approach for finding the point correspondences. There are several reasons 
why obtaining accurate and reliable point correspondences is a hard problem. One is that 
the same features in two different images do not necessarily look the same due to differences 
in foreshortening. Features which appear in one image may be occluded or outside the field 
of view in the other, or there may be multiple solutions for matching if there are repeating 
patterns in the images. The high computational cost of computing similarity measures is 
an additional drawback to obtaining a large number of accurate matches. 

Methods which have been proposed for determining correspondences can be grouped 
into three broad categories: brightness-based methods, gray-level correlation, and edge- 
based methods. These differ primarily in the types of features used and in their strategy for 
solving the problem. Hybrid methods, which combine aspects from each of the approaches, 
have also been developed; however, these are best understood by examining the major 
categories individually. In this chapter, I will review the advantages and weaknesses of 
the different approaches and discuss their practicality for hardware implementation with 
respect to the goals of the present system. 

3.1 Brightness-Based Methods 

The idea in brightness-based methods is to avoid explicitly searching for the best match 
for each pixel by formulating a global minimization problem whose solution gives the relative 
displacement of every point. These methods are similar to those developed for computing 
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optical flow by minimizing the error in the brightness change constraint equation. In fact, 
there are only minor differences in the formulation of the two problems. In addition to 
avoiding search, the two primary advantages to this approach, which are not shared by the 
correlation and feature-based methods discussed next, are that information from the entire 
image is used in determining the offsets and that it is possible to obtain a dense set of 
correspondences, even in areas of the image which lack distinctive features. 

The procedure, as described in [49] and [60], consists of assuming that the gray-levels of 
corresponding points are approximately the same and finding disparity functions dfj(x, y) 
and dy(x,y) such that, ideally, 

E L (x + -d H (x, y), y + -d v (x, y)) = E R (x + -d H (x, y),y+ -d v (x, y)) (3.1) 

where E^x^y) and E R (x,y) are the brightness functions associated with the left and right 
images respectively. 

Due to variations and offset between the two sensors, it is not expected that equa- 
tion (3.1) can be solved exactly. Instead, the desired solution is the one which minimizes 
an error function composed of different penalty terms. Horn [49, Chapter 13] suggested 

e 2 + X 2 e 2 s dxdy (3.2) 

where e 8 - is the error resulting from the failure of (3.1) to hold exactly, 

ei = E L -E R (3.3) 

and e 2 s represents the departure from smoothness of the disparity functions as measured by 
the squared Laplacian 

e 2 s = (V 2 d H f + (V 2 d v f (3.4) 

The coefficient A 2 defines the relative weighting of the two error terms. 

Gennert [60] proposed a similar, though more elaborate, energy function which included 
a multiplicative model for the transformation of brightnesses in the two images 

E R -^ mE L (3.5) 

The multiplier m takes into account changes in reflectance due both to changes in albedo 
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and in the orientation of the surface being imaged. 

The only significant difference between equations (3.2) and (2.55), the error function 
for optical flow, is that the constant brightness assumption is not expressed in terms of the 
spatial and temporal derivatives of brightness. However, the derivatives reappear in the 
Euler equations which for equation (3.2) are 

< v w>" = kiw+w)^-™ (3 - 7) 

The functions in these equations are evaluated at the points (x + ^dfj(x,y),y) and (x — 
^djj(z, y), y) in the left and right images, respectively. 

The aliasing problem discussed in Section 2.3.3 thus arises in a different form. If the 
image is not sufficiently bandlimited, or if good initial values for djj and dy are not available, 
it is unlikely that a minimization procedure based on gradient descent will converge. If the 
derivatives are evaluated too far away from the correct point, the gradient will not point 
in the direction of the solution. It would thus be very difficult to implement this method 
in circuit form using analog networks as was done for optical flow by Tanner [54], and for 
finding the focus of expansion (FOE) by McQuirk [59]. Furthermore, it should be added 
that the full power of this method is not needed for computing 3-D motion since a dense 
set of point correspondences is not required to solve the rigid-body motion equations. 

3.2 Gray-level Correlation Techniques 

Gray-level correlation merits close attention since it is widely used in commercial ap- 
plications. The theoretical basis for the use of correlation in determining point matches 
between two images is the well known result from classical detection and estimation theory 
that, under certain conditions, an optimum decision rule for detecting the presence of a 
known signal in an observed noisy waveform can be obtained from the cross-correlation of 
the signal with the waveform [61]. The decision is based on whether the value of the correla- 
tion function is above a threshold determined from either a Bayes cost function or a desired 
false alarm rate in a Ney man- Pearson test. The position of the maximum correlation value 
gives the most likely position of the signal, under the hypothesis that it is present. 



CHAPTER 3. MATCHING POINTS IN IMAGES 51 

In a typical procedure, one image is divided into N, possibly overlapping, M xM blocks. 
Let i, 1 < i < N denote the ith block and let (j, k) be the coordinates in the first image of 
the center of the block. The correlation function of the ith image block centered at (j 7 , k') 
in the second image is given by 

C0, *') = E E MJ + hk + m)E R (j' + l,k' + m) (3.8) 

where the indices / and m range in integer steps from — (M — l)/2 to +(M — l)/2. 

The underlying assumption which makes the value of the cross-correlation function a 
sufficient statistic for testing the presence of a signal, is that the observed waveform is 
a stationary white Gaussian noise process upon which the signal may, or may not, be 
superimposed. If the background noise process is nonstationary, but is additive, white and 
Gaussian, an optimal test can still be formulated, but the detection threshold corresponding 
to a given false alarm rate will be a function of position and must be computed for each 
block. If the noise is non- white, a "whitening" filter should be applied before computing 
the correlation function. 

In real images, the background process is generally non-stationary and non- white. When 
different cameras are used, sensor offset, combined with differences in the illumination of 
the same object viewed from two different positions, will ensure that the brightness values 
measured from the same feature will almost never be identical in the two images, even in the 
absence of other noise. In addition, with either one or two cameras, the background noise — 
which usually means the other features in the image as well as variations in the number 
of photons collected — will seldom have zero mean value. Practical methods for eliminating 
the effects of sensor and illumination differences are to preprocess the image data with a 
band-pass filter to both remove dc offsets and reduce the variance of high frequency noise, 
or to compute the normalized correlation coefficient, defined by 

MJ ' )_ * L (j,k)a R (ji,k>) (3 - 9) 

where /iz,,/iR an d a Li a R denote the sample means and standard deviations of El and Er 
over their respective blocks. 

It can be easily verified that the normalized correlation coefficient is unchanged by any 
linear transformation of the brightness functions, El and Er. The search for the maximum 
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correlation value of the ith block is confined to a pre-specified window W{ of area A in the 
second image. The total computational cost for determining the best match for all N blocks 
is therefore 0{N M 2 A) if the function is evaluated at every offset of each search window. 
If M is large enough, the total complexity may be reduced using fast Fourier transform 
(FFT) techniques to 0(NAlog 2 A). The cost of preprocessing the images with a band-pass 
filter is 0(L 2 log 2 L) for L X L images and is 0(L 2 M 2 ) for computing the local means and 
variances needed for the correlation coefficient in (3.9). 

The cost of brute force search by computing the correlation function at every offset of a 
large search window is prohibitively high, even with modern fast processors. Systems which 
have been designed to perform matching based on gray-level correlation generally implement 
some intelligent method for reducing the search space. One simple method, suggested by 
Barnea and Silverman [62], is to use a sub-optimal, but more easily computed, similarity 
measure such as the sum of absolute values of differences 

Vi(j', *') = ££ \MJ + l,k + m)- E R (f + l,k' + m)\ (3.10) 

with the best match being given by the location of the minimum value of V{. 

A detection test based on the sum of absolute values of differences is a computationally 
efficient approximation to one based on the correlation coefficient, and, under certain con- 
ditions, the two are in fact equivalent. If El and E R are quantized to integer values, the 
absolute value of the difference between two pixel values is always less than or equal to the 
square of the difference. That is, 

V0, k') = J2J2 \MJ +l,k + m)- E R (f + l,k' + m)\ 

l m 



^ J2J2(Mj + ^k + m)-E R (j' + l,k' + mj) 2 (3.11) 



/ m 

Using equation (3.9) and the definitions of the sample means and variances 

/ , = i2EE^' + ^+ m ) ( 3 - 12 ) 

and 

° 2 = -m~2 E E W + hk + m)- M ) 2 (3.13) 

/ m 
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it can be shown that 

V0, k') < M 2 [(a L - a R ) 2 + (p L - p R ) 2 + 2a L a R (l - Pi )\ (3.14) 

Thus the test which accepts a match when pi > r c is equivalent to the test which accepts 
a match when Vi < Ty = M 2 [(<jl — or) 2 + (jil — I^r) 2 + 2<7l<7_r(1 — r c )]. Also, as long as 
((J>L, (J>r) an( i (vLiVr) are approximately constant over the search window, the position of 
the minimum value of Vi will be the same as that of the maximum value of pi. 

Many commercial hardware systems for feature detection have been developed based on 
the measures just described. A few examples among the many currently available systems 
are: l) the alignment system developed by Cognex Corporation which uses normalized 
correlation search [63]. This system, which is contained on a single 340mmx 366mm printed- 
circuit board, along with image capture hardware, frame memory and other interfaces, uses 
intelligent search strategies and clever programming tricks to achieve high-speed alignment. 
In a recent brochure, Cognex claims to be able to align a 128x128 pixel template in a 
500x400 pixel image in 200 milliseconds. 2.) The real-time image processing and alignment 
board by Sharp Digital Information Products, Inc., which fits into a personal computer. 
Using software which runs on the host computer's CPU and which interfaces to two special- 
purpose processor boards, they claim that the system can find a 100 X 100 template within 
a 512x512 search area in less than 100 milliseconds. 3.) The Max Video 20 system by 
Datacube, Inc., which is perhaps the most widely used system, interfaces to a VME bus and 
performs numerous image processing applications, along with alignment. 4.) The STI3220 
single-chip motion estimation processor designed by SGS Thompson Microelectronics used 
to implement the MPEG data compression algorithm at video rates. 1 This chip finds 
the minimum sum-of-absolute-values-of-differences between two blocks over a maximum 
displacement of +7/ — 8 pixels in both horizontal and vertical directions. 

Some experimental designs have also been based on gray-level correlation. Recently 
Hakkarainen [65] developed a test system in analog CCD/CMOS technology to compute 
stereo disparities along a single horizontal row of pixels (parallel epipolar geometry as- 
sumed). The matching circuit incorporated a 40x40 pixel absolute- value-of-difference array 
designed to find candidate matches within a maximum disparity range of 11 pixels. 



MPEG is a motion picture compression technique based on coding the offsets between bfocks in two 
frames. See [64] for more detaiis. 
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Despite the preponderance of gray-level correlation in commercial vision applications 
and elsewhere, it does have limitations which make it unsuitable for computing general 3-D 
motion. An important drawback is that it is very sensitive to differences in foreshortening 
of surfaces which are viewed from different angles. Unless the motion of the cameras is 
defined by a pure translation parallel to the image plane, and all objects in the scene are 
at the same depth, the two images will not simply be shifted versions of each other. The 
conditions for optimality of the correlation-based decision rule no longer hold in the presence 
of foreshortening since the signal is distorted. The systems just cited were developed for 
applications in which foreshortening is not a major problem. In industrial settings, the 
scene structure can be controlled, and the choice of the template to be matched is guided 
by the user. Furthermore, the parts to be located or aligned are usually confined to a single 
plane which is held at a fixed orientation to the optical axis. In the MPEG compression 
algorithm, small offset errors are not very important because the human visual system 
cannot perceive fine spatial detail in moving image sequences. In more general settings, 
however, particularly when long baselines are used as is often the case in binocular stereo, 
foreshortening cannot be neglected. 

A second limitation of gray-level correlation is that it is not by itself sufficient for 
computing reliable matches given arbitrary blocks within an image. The performance of 
a correlation test depends on the signal-to-noise ratio, which is low if the block does not 
contain distinctive features. For example, it is very difficult to find the correct offset for 
a patch of constant or smoothly varying brightness within a larger region also of constant 
or smoothly varying brightness. Industrial applications avoid this problem by using large, 
previously selected templates. However, if the blocks are chosen by arbitrarily dividing 
the image, there is no guarantee that each one can be reliably matched. Since further 
processing, such as finding edges, is required to determine the distinctiveness of each block, 
methods which are based on matching the edges themselves are more attractive in general 
than those based on gray level correlation. 

3.3 Edge-Based Methods 

Edges, which are locations of rapid and significant change in the image brightness func- 
tion, are usually caused by changes in the surface reflectance or orientation of the imaged 
objects. These occur at changes in surface markings as well as at the boundaries of objects 
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Figure 3-1: Example of matching edges despite foreshortening. 



and are thus intrinsic characteristics of the scene which transform under the same rules of 
rigid body motion as the surfaces to which they are associated. 

Among the advantages of using edges, as opposed to gray levels, as matching primitives 
are that they are insensitive to photometric effects and that they are less strongly affected 
than gray levels by foreshortening. A simple example of the latter issue is shown in Figure 3- 
1 which depicts a hypothetical perspective transformation of a wire-frame box. Although 
the lengths of some of the sides change, the edge patterns at the corner points retain 
enough similarity that they can be uniquely matched between the two views. Fhe often- 
cited disadvantage of edge-based methods — that they can only generate sparse matches — is 
a problem for obtaining depth from binocular stereopairs, but not for computing motion. 

Edge-based methods can be divided into two categories according to the manner in which 
they represent edges. Methods in the first category operate directly on the binary image, 
or edge map, produced by the edge detection algorithm, using some form of correlation 
matching similar to those described for gray level images. Methods in the second category, 
however, take the output of the edge detector and extract higher-level primitives, such as 
lines and corners. Fhe attributes of these primitives, i.e., length, end-point coordinates, 
direction, etc., are then compiled into a symbolic description of the principal features of 
the image, originally referred to by Marr as the full primal sketch [66]. Matching is then 
performed by searching the feature space for the best-matching sets of attributes. 

Fhere are many possible variations on methods for using the binary representations of 
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edge locations for correlation. Novak [67] lists several different similarity measures and 
discusses their relative merits. Wong [68] describes different processing techniques that 
can be applied to the edges to yield more reliable matches. Nishihara [69] developed a 
binary correlation method based on the sign representation of the image convolved with a 
Laplacian of Gaussian operator, V 2 G, where G, given by 

f * 2 +y 2 
G=- — e~~^ (3.15) 

zira 

is the Gaussian smoothing filter of bandwidth proportional to \ja. The zero-crossings of 
the V 2 Gr-filtered image, which occur at local maxima in the smoothed brightness gradient, 
were first proposed by Marr and Hildreth [70] as markers for edges. In Nishihara's method, 
edges are implicitly represented by encoding the sign bit of V 2 G * / rather than by explic- 
itly locating its zero-crossings. He showed that this representation, whose auto-correlation 
function is sharply peaked at the origin, permits higher resolution disparity measurements 
than correlation using the values of V 2 G * / itself. This algorithm has been implemented 
as a stand-alone system designed on a VME bus with a video rate Laplacian-of- Gaussian 
convolver. The present version of the system allows disparity measurements at an arbitrary 
image location in approximately 400 microseconds. The sign-correlator algorithm operates 
on a single 6U (233.4mm X 160mm) VME bus board and implements 36 parallel correlators 
that run at a 10MHz pixel rate. The Laplacian-of- Gaussian convolution is performed by a 
second 6U VME bus board that takes 10MHz digital video input from a Datacube maxbus 
from the two cameras to produce two 16-bit digital video raster signals. These two boards 
fit in a VME bus box along with a video digitizer board, a single board computer, and a 
motor controller board for the camera head. 2 

From the viewpoint of detection theory, binary correlation methods based on the edge 
maps offers the same advantages as gray-level correlation without several of the disadvan- 
tages. As mentioned, edges are less sensitive than surfaces to foreshortening. In addition, 
it is much easier to test the reliability of matches from edge-based correlation. As will be 
shown in Chapter 6, reliability is directly related to edge density, which can be determined 
by simply counting the number of edge pixels in the blocks being compared. 

The accuracy of block-correlation techniques, gray-level or binary, is inherently limited, 



"H. K. Nishihara, personal communication, September 1992 
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however, by the implicit assumption that the entire block is at the same disparity. The 
advantage of symbolic matching techniques is that they can provide true point-to-point 
matching, since corners are matched to corners and line segments to line segments. There are 
many algorithms which have been developed to perform search on the set of extracted image 
features. They differ primarily in their choice of attributes and in the constraints which are 
imposed to reduce both the search space and the number of false matches. Grimson [71] 
developed a hierarchical method based on a coarse-to-fine matching of zero- crossing contours 
of the same sign from the image convolved with Laplacian-of- Gaussian operators, V 2 G, with 
different values of a. He imposed both consistency and figural continuity constraints to limit 
false matches and to effectively map complete contours. Ayache and Faverjon [72] proposed 
creating neighborhood graph descriptions from the extracted line segments in each image 
and then determining the largest connected components of a disparity graph built from 
the two descriptions. Matches are validated by imposing global continuity constraints and 
rejecting any connected components with too few members. Fleck [73] recently proposed 
another variation on these methods by introducing a topological pre-matching filter which 
provides a stronger test than allowed by consistency and figural continuity constraints. 

The primary disadvantage of symbolic matching techniques with respect to the design 
goals of the 3-D motion system is that they cannot both be easily implemented in simple 
hardware and be expected to operate at video rates. Reducing the edges to primary features 
and building the symbolic descriptions requires processing and memory resources that are 
beyond the capabilities of single-chip systems. Binary correlation, on the other hand, can 
be implemented relatively cheaply. The primary expense is not in computing the correlation 
measure, which requires much less hardware than if gray levels are used, but in initially 
computing the edge maps. In the sign- correlation system developed by Nishihara, for ex- 
ample, only one 6U VME board is devoted to the actual correlation operation, while two 
boards and a Datacube image processor are required for capturing, digitizing, and filtering 
the images. 

As the overview in this chapter has shown, there is no simple method that can provide 
accurate and reliable point correspondences in all situations. The procedure which is best 
suited to the present system is the one that provides the best tradeoff between the require- 
ments for accuracy and simplicity. Among the different approaches for determining point 
correspondences, the block matching procedures based on computing similarity measures 
between edges are the simplest to implement in hardware. Since edges can be represented 
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by binary values, computations can be performed as boolean operations. Furthermore, the 
number of edges in each block provides an easily computed measure of the reliability of 
the match. The primary drawback of block matching procedures is their limited accuracy 
due to the assumption that the entire block is at the same depth. We will have to ensure 
that enough matches are found so that on average the error will go to zero in order for the 
motion algorithm to compute accurate estimates of the camera motion. 



Chapter 4 



System Architecture 



We can now formulate a plan for the architecture of a system to compute 3-D motion 
with specialized analog and digital VLSI processors based on the diagram of Figure 4-1. 
The two input images acquired at the different camera positions will be referred to as left 
and right, regardless of whether this terminology reflects their true spatial disposition. If 
the images are acquired by the same camera, the two input blocks should be considered as 
memory buffers. 
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Figure 4-1: System block diagram 

Two major trends can be discerned in observing the processing and data flow shown in 
this diagram. The first is that the amount of data decreases significantly from left to right, 
from the thousands of pixels in the input images, to the reduced binary edge maps, and 
then to the set of point correspondences that are used to compute the few numbers which 
characterize the motion. The second trend, however, is that computational complexity 
increases just as significantly in the same direction. The mix of analog and digital processing 
which is the most power- and area-efficient for a given task is largely determined by the 
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ratio of data to the complexity of the operations performed on it. 

Working backwards from right to left, we see that the ultimate goal of the processing 
performed on the two images is to extract a set of point correpondences which can be used 
in the final stage to determine the motion. Given the complexity of solving the motion 
equations, it is clear that standard digital processing techniques are required. Any one of 
the many currently available powerful microprocessors, such as the TI TMS320C40, or the 
Motorola 68040, can certainly do the job. However, since the ultimate goal is to build a 
low-power system, we need to use the simplest processor that is adequate for the task. In 
this thesis, I did not attempt to design a minimal custom digital circuit to solve the motion 
equations. However, I will show, in Chapter 7, how the complexity of the motion algorithm 
can be reduced so that it can be implemented on a low-end processor or microcontroller. 

The set of point correspondences which are fed into the motion algorithm are best found 
by matching edges using binary correlation, as was concluded at the end of Chapter 3. In the 
second processing stage we can build a pipelined array of matching circuits, each of which 
computes for a specific patch in one edge map the translational offset which brings it into 
alignment with the most similar patch in the second edge map. The search is restricted to a 
predefined area whose dimensions should be user-controllable according to the application. 
Since the search window may need to be quite large if the baseline between the two camera 
positions is long or if any amount of rotation is involved, it is necessary to use a scoring 
method which has a very low false-alarm rate. Given that repeating patterns frequently 
occur in real scenes, a similarity measure alone is not sufficient to achieve an acceptable error 
rate. In Chapter 6, I will present the scoring method to be implemented by the matching 
circuits as well as discuss the tests which are included in the decision rule to minimize the 
number of false matches. 

Because the edge signals are binary, computing the scores for each offset requires rel- 
atively little circuitry and can be easily done in digital logic. Tallying the scores and 
determining the best match, however, are considerably more complex. In Part III of this 
thesis, I will describe the design of a mixed analog and digital circuit for finding the best 
match and compare it to a purely digital implementation. 

Edge detection is performed in the first stage of the motion system by operating di- 
rectly on the signals acquired by the photosensors. Here, there is a tremendous amount of 
data to be processed, but the operations involved, which are computing local averages and 
differencing neighboring pixel values, are relatively simple. We thus have a situation where 
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analog processing can offer significant advantages over standard digital methods. Detecting 
edges efficiently in analog VLSI, however, requires an algorithm which is adapted to the 
technology, unlike most standard methods which were designed for digital computers. The 
multi-scale veto (MSV) algorithm, presented in the next chapter, was thus developed to 
take advantage of operations which are easily performed in analog, while avoiding those 
that are not. Part II of this thesis is devoted to describing the design, fabrication and 
testing of a CCD-CMOS edge detector implementing the multi-scale veto algorithm, which 
is a prototype of one that could be used in the 3-D motion system presented above. 



Chapter 5 



An Algorithm for an Analog Edge Detector 



The bulk of hardware resources in almost all image processing systems is dedicated to 
data storage and to the initial operations performed on the brightness values acquired by the 
sensors. One image typically contains several tens to hundreds of thousands of pixels, each 
usually digitized to 8 bits. Even simple computations, such as adding and subtracting pixel 
values require substantial processing due to the large amount of data involved. Given that 
relatively low precision is required, however, there is a clear opportunity for analog circuits 
to perform many of the initial processing tasks on the image data. Analog circuits which 
are specifically designed for a task can perform arithmetic and logic operations with 6-8 
bits precision in much less area than an equivalent digital implementation. Furthermore, 
by remaining in the analog domain, there is no need to digitize the signal from the sensor 
before it can be processed. 

The multi-scale veto (MSV) algorithm described in this chapter was developed to solve 
two problems. The first was that we needed an edge detection algorithm which could 
be efficiently implemented on a fully parallel analog processor. The second was that we 
also needed an algorithm to accurately localize edges without being overly sensitive to 
noise. In this chapter I will discuss how both problems were addressed, presenting first 
some background on classical edge detection methods to explain why it was decided to 
develop a new method rather than to encode an existing one into a circuit design. I will 
also introduce, at a conceptual level, circuit models for implementing the MSV algorithm. 
The more detailed design description will be saved, however, for part II where the actual 
prototype processor which was built based on these models is presented. In the final section, 
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I will present results from simulating the algorithm on a pair of image sequences which will 
be seen again in Chapters 6 and 7 to demonstrate the matching procedure and the motion 
algorithm. 

It should be noted that most of this chapter is derived from a previously published 
paper [74] in which the multi-scale veto algorithm was first described. 

5.1 The Multi-Scale Veto Rule 

The problem of edge detection is to find and mark locations of significant change in the 
image brightness function that are due to changes in the reflectance of objects in the scene, 
while ignoring any changes caused by high spatial frequency features attributable to noise. 
'Noise' is not a well-defined term, however, as it is used to refer both to random fluctuations 
in the number of photons collected as well as to small-scale 'unimportant' features in the 
image. Noise can be removed by applying a linear low-pass smoothing filter. However, 
this has the effect of attenuating all high frequency components indiscriminately and in- 
troducing uncertainty in the edge locations. Nonlinear methods, such as median filtering, 
which preserve important edges and remove noise are also possible. However, these require 
more computation than linear filtering and generally cannot be implemented by convolu- 
tion. The MSV algorithm was designed to overcome the problems associated with standard 
linear filtering methods. Its circuit implementation is conceptually straightforward, and it 
incorporates a simple procedure for the user to select the types of features which are to be 
defined as noise. 

In the MSV algorithm, edges are defined as sharp changes in brightness which are 
significant over a range of spatial scales. In order to test for the presence of an edge, 
a sequence of low-pass filters of decreasing bandwidth is applied to the image, and the 
differences between the smoothed brightness values of neighboring pixels are computed. An 
edge exists between two pixels if the difference in their values is above a threshold, which 
is specified for each filter, at all levels of smoothing. If the threshold test is failed for any 
filter, the edge is vetoed. 

The rationale behind the multi-scale veto method can be explained by observing how it 
treats different types of features. Let Xk[m, n] denote an array of sampled brightnesses which 
has been convolved with the kth. low-pass filter. Let yk[m, n] = Xk[m, n] — Xk[m, n—1] denote 
the differences in the smoothed brightnesses in the direction of the second coordinate, and 
Gk[m, n] the attenuation of the difference signal, such that yk[m, n] = Gk[m, n]yo[m, n]. The 



CHAPTER 5. AN ALGORITHM FOR AN ANALOG EDGE DETECTOR 64 

filters are ordered in decreasing bandwidth so that Gk[m, n] > Gk+i[m, n]. Let r^ denote 
the threshold for the kth. filter, and suppose that there is an abrupt change in brightness 
between n = and n = — 1 such that yo[^ 5 0] = A, with A a positive constant. 
Formally stated, an edge will be marked at n = only if 

for k = 0, . . . , N — 1, where N is the number of filters applied. 

An example is illustrated in Figure 5-1 for the cases of an ideal step edge and an isolated 
noise spike. The step edge is marked at n = because the differences yo[^ 5 0] and yk[m, 0] 
are both above threshold. No other locations are marked, although yk[m, n] ^ in general, 
and for some n may even be greater than r^, because yo[m, n] = for all n ^ 0. Hence the 
unsmoothed differences will veto the marking of an edge everywhere except at n = 0. In 
the case of the isolated noise spike, the difference at n = 0, which is of the same magnitude 
as for the step, passes the threshold test for the unsmoothed data. However, it fails for 
the smoothed data since the isolated spike is attenuated more strongly than the step edge, 
and hence no edge is marked. In general, it may be observed that while the bandwidth 
and threshold of the narrowest-band, or largest scale, filter determines the effectiveness 
with which noise and small features are removed, the widest-band, or smallest scale, filter 
determines the accuracy with which edges are localized. 

The idea of using multiple scales in edge detection is not new. It is the following features 
which distinguish the MSV algorithm from conventional methods. 

• Edges are not defined as local maxima in the magnitude of the gradient, or equiva- 
lently, as zero-crossings of the second derivative. Hence computation of second differ- 
ences is unnecessary. 

• All of the difference operations and threshold tests at different scales can be performed 
on the same physical network. 

Both features represent a considerable savings in circuitry, which is crucial if the network 
is to be designed for large image arrays. 

By definition, edges exist between two pixels on a discrete two-dimensional array. How- 
ever, to avoid redefining the image grid, their locations are indicated in the output of the 
edge detection network by setting a binary flag at the locations of the pixels between which 
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yo> T o 




(a) Response to an ideal step. Differences j/o and yj. are both above threshold at n = 0. The edge 
is marked at the point of change in the unsmoothed data. 



yo> T o 



Jk<\ 



(b) Response to ideal point noise. The difference j/o in the unsmoothed image is above threshold To 
at n = 0; but the difference y^ in the smoothed image is below the threshold r^. Hence no edge is 
marked. 



Figure 5-1: Results of applying the multi-scale veto rule to an ideal step edge and to an 
impulse. 



CHAPTER 5. AN ALGORITHM FOR AN ANALOG EDGE DETECTOR 66 

they occur. To simplify the discussion these marked pixels will be referred to as edge pixels; 
although a more exact term would be edge-adjacent pixels. 

It should be noted that a significant consequence of defining edges by the multi-scale 
veto rule is that a very good visual approximation to the original image can be reconstructed 
from the (possibly smoothed) brightnesses at the edge pixels. This operation can be per- 
formed by a second processor which recomputes brightness values for non-edge locations by 
interpolation from the values at the edge pixels. It is thus possible to recover a smoothed 
version of the original image from noise-corrupted input while maintaining important high 
frequency information. A more complete analysis of this aspect of the MSV algorithm, 
along with examples of reconstructed noisy images can be found in the original paper [74]. 

5.2 Other Methods and the Use of Multiple Scales 

In most work in computer vision, edges are defined as the loci of maxima in the first 
derivative of brightness, and as such can be detected from zero-crossings in the second 
derivative. This is the basis on which many edge and line detectors, such as the Marr- 
Hildreth Laplacian-of- Gaussian (LOG) filter [70], the Canny edge detector [75], and the 
Binford-Horn line finder [76], have been designed. The problem of finding edges by the 
numerical differentiation of images, however, is ill-posed [77]. Small amounts of noise, which 
are amplified by differentiation, can displace the zero-crossings or introduce spurious ones. 
A low-pass smoothing, or regularization, filter must be applied to stabilize the solution. 

The issue of scale arises because features in the image generally occur over a range of 
spatial scales. By varying the passband of the smoothing filter, one can select the size of 
the features which give rise to edges. Unfortunately, the information which permits the 
edge to be accurately localized to the feature which produced it is thrown out with the high 
frequency components. Marr and Hildreth first proposed finding edges from the coincident 
zero-crossings of different sized LOG filters. Witken [78] introduced the notion of scale- 
space filtering, in which the zero-crossings of the LOG are tracked as they move with scale 
changes. These methods are a form of multi-scale veto, but the complexity of tracking the 
zero-crossings makes them ill-suited for implementation in specialized VLSI. 

An alternative solution to removing noise while retaining the high frequency information 
associated with large scale features is to apply nonlinear filtering. The median filter [79], 
for example, has long been used in image processing because it is particularly effective in 
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removing impulse, or 'salt-and-pepper', noise. An approach put forward in recent years is 
the idea of edge detection, or more precisely image segmentation, as a problem in mini- 
mizing energy functionals. The first proposal of this nature was the Markov Random Field 
(MRF) model of Geman and Geman [35]. In an MRF the minimum energy state is the 
maximum a posteriori (MAP) estimate of the energies at each node of a discrete lattice. 
The MAP estimate corresponds to a given configuration of neighborhoods of interaction. 
'Fine processes' are introduced on the lattice to inhibit interaction between nodes which 
have significantly different prior energies, thereby maintaining these differences in the final 
solution. 

Mumford and Shah [80] studied the energy minimization problem reformulated in terms 
of deterministic functionals to be minimized by a variational approach. Specifically, they 
proposed finding optimal approximations of a general function d(x,y), representing the 
data, by differentiable functions u(x,y) that minimize 

J(u,T) = fi 2 / (u- dfdxdy + / / \Vu\ 2 dxdy + u\T\ (5.2) 

J Jr J Jr-t 

where u(x,y) is a piecewise smooth approximation to the original image and T is a closed 

set of singular points, in effect the edges, at which u is allowed to be discontinuous. The 

coefficients fi 2 and v are the weights on the different penalty terms. 

Blake and Zisserman [81] referred to (5.2) as the 'weak membrane' model, since J(u,T) 
resembles the potential energy function of an elastic membrane which is allowed to break 
in some places in order to achieve a lower energy state. If T is known, the solution to the 
minimization problem can be found directly from the calculus of variatons. However, the 
problem is to find both T and u(x,y) by trading off the closeness of the approximation 
and the number of discontinuities in the set. As a result the energy function J(u,T) is 
nonconvex and possesses many local stationary states that do not correspond to the global 
minimum. Blake and Zisserman were able to circumvent the problem of multiple local 
minima by developing a continuation method to solve the minimization problem iteratively. 

The weak membrane model was one of the first methods to be implemented in ana- 
log VFSF Digital circuits for performing Gaussian convolution and edge detection began 
appearing in the early 1980's [82], [83]. The possibility of performing segmentation and 
smoothing with analog circuitry, however, did not seem practical until the problem had 
been posed in terms of a physical model. Harris [84] developed the first CMOS resistive 
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fuse circuit, which is a two-terminal nonlinear element that for small voltages behaves as 
a linear resistor, but 'breaks' if the voltage across its terminals becomes too large. Several 
implementations of resistive fuse networks have since been built to compute the minimiza- 
tion of the discrete form of equation (5.2) [57], [85]. Keast [86] developed a discrete-time 
version of the weak membrane model using CCDs to perform smoothing. 

Circuit implementations of the weak membrane model cannot escape the non-convexity 
problem, however, and some effort is required to push them to the globally optimum solu- 
tion. The MSV model is similar to the weak membrane in that it also assumes an image 
can be approximated by a collection of piecewise smooth functions. It is different, however, 
in that it does not formulate edge detection as an energy minimization problem. The oper- 
ations of edge detection and image reconstruction are completely separate and independent 
functions, so that there is no feedback coupling to generate alternate local minima. Hence 
for any image and given set of parameters, there is a unique set of edges which will be 
found. 

It should be noted that the edges produced by the MSV network are not as 'refined' as 
those produced by more complex methods such as Canny 's edge detector [75]. This is in 
part due to the way edges are defined. Since many feature boundaries are more like ramps 
than step edges, the MSV edges are often several pixels thick. It is also due to the need to 
make the circuitry as simple as possible in order to minimize silicon area. It is not easy to 
implement contour filling or thinning algorithms with simple circuits. The edges produced 
by the MSV algorithm are nonetheless functionally useful for many early vision tasks, and 
in particular, they will be shown to be useful for feature matching. 

5.3 Circuit Models 

It is not necessary to build a multi-layered processor in order to implement the multi- 
scale veto rule. By including time as a dimension, a single smoothing network with a 
controllable space constant can be used. It is well known that resistive grids, such as the 
one shown in Figure 5-2, can compute an analog smoothing function. The network shown is 
one-dimensional; however, it can be easily extended to two dimensions by connecting it via 
transverse resistors to parallel 1-D networks. By equating the current through the vertical 
resistors connected to the node voltage sources d{ to the sum of the currents leaving the 
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Figure 5-2: 1-D resistive smoothing network with controllable space constant, 
node through the horizontal resistors, one arrives at the resistive grid equation: 

Rr 



R f 



J2( u >< 



di 



(5.3) 



where the subscript k is an index over the nearest neighbors of node i. The continuous 2-D 
approximation to this circuit is the diffusion equation 



u- YV z u = d 



(5.4) 



with 



A 




(5.5) 



which is the space constant, or characteristic length, over which a point source input will 
be smoothed. By varying the values of R v , it is therefore possible to control the bandwidth 
of the effective low-pass filter applied to the data. 

A practical way to build a controllable smoothing network is to simulate the resistors 
with charge- coupled devices (CCDs). CCDs are best known for their role as image sensors, 
but they are also capable of performing more advanced signal processing. CCDs operate in 
the manner of a bucket brigade, where the 'buckets' are potential wells under polysilicon 
gates, and the depths of the wells are determined by the voltages applied to the gates. The 
'water' in the buckets is the signal charge which can be transferred, mixed, and separated 
between the potential wells by varying the sequences of the clock phases which drive the 
gates. CCDs are built by juxtaposing gates of alternating layers of polysilicon. When used 
as image sensors, the gates are held at a high potential to collect the charge that naturally 
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Figure 5-3: 2-D MSV edge detection array using charge- coupled devices (CCDs) 
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Figure 5-4: Conceptual model of the 'Edge precharge circuit'. 



occurs when photons of energy greater than the bandgap of silicon hit the device and create 
electron-hole pairs. After a suitable integration time, generally a millisecond or so, the 
CCDs then function as analog shift registers to move the signal charges out of the camera. 
The basic layout of the 2-D network required to use CCDs for the MSV edge detection 
algorithm is shown in Figure 5-3. It consists of a grid of orthogonal horizontal and verti- 
cal transfer channels with circuitry placed between the nodes to compute differences and 
perform the threshold tests. The numbers on the gates signify the different clock phases 
which are used to move signal charges in the array. The structure of this network is the 
same as that developed by Keast to implement a CCD 'resistive fuse' network [86], [57]. By 
appropriately sequencing the clock phases, this array can perform smoothing by averaging 
the signal charge held under each node with each of its neighbors. Specifically, it applies 

the convolution kernel 

r 1 2 1 

2 4 2 (5.6) 

1 2 1 



1 
16 



to the image signal with each smoothing cycle. After two cycles the image has been effec- 
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tively convolved with 



1 
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6 24 36 24 6 

4 16 24 16 4 

14 6 4 1 



(5.7) 



and so on. The bandwidth of the smoothing filter is thus controlled by the number of cycles 
performed. 

The primary difference between Keast's design and the MSV network is in the functions 
performed by the circuits placed between the nodes. The multi-scale veto rule is imple- 
mented by the edge precharge circuits, shown in Figure 5-4 and indicated by the boxes 
labeled EPC in Figure 5-3. In each of these, a capacitor is initially charged with an 'edge' 
signal. At each smoothing cycle, the absolute value of the difference between the node volt- 
ages is compared to a threshold; and if the threshold is greater, the capacitor is discharged. 

The complete execution of the multi-scale veto algorithm consists of the following steps: 
The array is initialized by transferring signal charge proportional to image brightness under 
each node gate (pixel) and by charging the edge capacitors. The signal charge is formed 
either by direct acquisition using the CCD array, or by loading the pixel values from an 
off-chip sensor. Several smoothing cycles, ~5-10, with the accompanying threshold tests, 
are then performed. 

When these are completed, the edge charges from the four precharge circuits connected 
to each node are tested; and, if any of them is non-zero, i.e., if an edge was detected between 
the node and one of its four neighbors, a binary value is set at the node to indicate that it 
is an edge pixel. 



5.4 Choosing the Parameters 

It might seem that the number of free parameters — the different thresholds for each 
smoothing filter, as well as the number of smoothing cycles — that need to be specified 
in order to apply the multi-scale veto rule would make the method impractical or even 
arbitrary. However, there are simple ways to choose the parameters based on the types of 
features which one wishes to retain. 

The edges which are marked by the edge detection network are those which pass the 
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Attenuation Factors 


Smoothing Cycle: 


1 


2 


3 


4 


5 


Horizontal step edge 


0.500 


0.375 


0.313 


0.273 


0.246 


Diagonal step edge 


0.375 


0.273 


0.226 


0.196 


0.176 


Horizontal 1-pixel line 


0.250 


0.125 


0.078 


0.055 


0.041 


Diagonal 1-pixel line 


0.125 


0.055 


0.032 


0.022 


0.016 


Horizontal 2-pixel line 


0.500 


0.313 


0.219 


0.164 


0.129 


Diagonal 2-pixel line 


0.313 


0.164 


0.105 


0.074 


0.056 


1-pixel impulse noise 


0.125 


0.047 


0.024 


0.015 


0.010 


4-pixel square impulse 


0.375 


0.195 


0.120 


0.081 


0.058 


Horizontal 3-pixel ramp 


0.750 


0.688 


0.641 


0.602 


0.568 



Table 5.1: Attenuation factors for different types of features as a function of smoothing 



threshold test at all smoothing cycles. Following the same notation used in the example 
given earlier, let k denote the number of smoothing cycles performed, and let r^ denote the 
threshold for the A;th cycle. Given the convolution kernel (5.6) which is implemented by 
the smoothing network, at each cycle, the attenuation factors, Gk for the difference signals 
corresponding to several idealized features are computed as a function of smoothing and 
given in Table 5.1. 

The ideal step edge refers to a two-dimensional feature which is infinite in one dimension 
but has an abrupt change from one pixel to the next in the other dimension. The ideal line 
corresponds to back-to-back step edges facing in opposite directions so that its 1-D cross- 
section resembles that of the impulse in Figure 5-1. The labels '1-pixel line' and '2-pixel 
line' in Table 5.1 refer to the width of the 1-D impulse. Impulse noise is a local abrupt 
change in brightness which is finite in both dimensions. Here, the labels '1-pixel impulse' 
and '4-pixel square impulse' refer to the area of the local discontinuity. Finally, the ideal 
ramp is an feature similar to a step edge, but for which the change in brightness occurs 
over several pixels (in this case 3) rather than abruptly. Some graphic examples of these 
features are shown in Figure 5-5. 

We also distinguish between horizontal features, which are those that are aligned with 
the rectangular pixel grid, while diagonal ones are oriented at 45° with respect to the grid. 
It can be seen from the values in the table that diagonal features are attenuated somewhat 
more than horizontal ones due to the nature of the smoothing operator, and consequently, 
edges aligned with the grid are favored over skewed edges. An isotropic operator could 
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a.) Horizontal step edge 



b.) Horizontal 3-pixel ramp 





c.) Horizontal 1-pixel line d.) 4-pixel square impulse 

Figure 5-5: Ideal 2-D image features. 
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be implemented with a hexagonally connected network. However, based on the numerous 
simulations performed to produce edges for the matching algorithm, the added complexity 
in the design does not seem to be warranted by the slight improvement in the results that 
an isotropic operator would provide. 

The values in Table 5.1 can be used as a guide for setting the parameters for more general 
types of features. As a specific example, suppose we want to retain only the boundaries 
from large objects in the image and remove all small scale features. The threshold To can 
be set as a function of the contrast in the image. We can perform 5 smoothing cycles and 
set rs = .246to. Features resembling step changes in brightness which passed the threshold 
at k = will have little trouble passing the test at k = 5, while features resembling 4-pixel 
square impulses will need to have an original difference greater than 4.2ro in order to pass. 
A simpler method to generate all the thresholds is to choose one idealized feature as a model 
and to compute 

Tk = GkjTo (5.8) 

where Gkj is the attenuation factor for the model feature at the kth. smoothing cycle. For 
the previous example, the model used was the horizontal step edge. In another case in 
which we only want to eliminate impulse noise while retaining thin lines, we might choose 
the diagonal 2-pixel line as a model. In an actual implementation, the values in Table 5.1 
can be held in a ROM and supplied to the MSV processor at each smoothing cycle. 

5.5 Simulations 

The results of simulating the MSV algorithm on four images are shown in Figures 5-6 
and 5-7. These images are from two motion sequences, one simulated and one real, which 
will be used again in the following chapters to demonstrate the matching procedure and the 
results of the motion algorithm. 

In the first sequence, Figure 5-6, the left image is a picture of a poster (of Neil Armstrong) 
taken by a Panasonic CCD camera, while the right image was generated by a computer- 
simulated motion applied to the first. The reason for generating simulated motion is to 
be able to test the results of the motion computation against known values. The motion 
simulation program assumes an image of a planar surface at a user-supplied depth and 
orientation. The focal length, principal point, and x,y pixel spacing are input to the 
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program to compute ray directions using the pinhole camera model. For the astronaut 
images, which are 400x400 pixels, the focal length and pixel spacing were such that the 
effective field of view, measured from the optical axis, was 55°. In the right image, the 
surface is modeled as a frontal plane (parallel to the image plane) at a depth of 10 (baseline) 
units, and the motion was a translation of 1 unit in the positive x direction with a 5° rotation 
about the y axis. 

In the second sequence, Figure 5-7, both images were taken by a Cohu digital CCD 
camera rigidly mounted on a movable carriage that could be translated along a fixed rail. 
The carriage assembly could be rotated on both the vertical and horizontal axes so that 
the camera could be oriented in any direction, with positional accuracy of better than .1°, 
on each axis. The advantage of the digital camera is that each pixel corresponds exactly to 
one sensor location, and there is no frame grabber in the path to resample and resize the 
data. The internal calibration matrix is thus very close to that given by the geometry of the 
image sensor, which is a 6.4mmx4.8mm CCD array with 756 (horizontal) and 484 (vertical) 
pixels. A 4.8mm lens was used to give an approximately 40° field of view, measured from the 
optical axis. The motion for the pair of images shown was a translation in the x direction 
followed by a rotation of 5° about the z axis. It should be emphasized, however, that this 
corresponds to the motion of the camera with respect to the motion stage coordinate system 
and not with respect to its own coordinate system. 

In applying the edge detection algorithm, step edge models were used for both sets of 
images, and 7 smoothing cycles were applied. The results are shown as the binary images 
below the originals. It should be noted that the apparent thickness of the edges is due in 
part to the method of marking both pixels on either side of the change in brightness, and in 
part to the presence of many brightness gradations (ramp-like edges) in the scenes. We will 
continue with these same image sequences in the following chapters for testing the matching 
procedure and the motion algorithm. 
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Figure 5-6: Simulated motion sequence: Frontal plane at 10 (baseline) units. Motion is 
given by: b = (1, 0, 0), 9 = 5°, Q = (0, 1, 0). 
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Figure 5-7: Real motion sequence: Motion with respect to motion stage coordinate system 
(not camera system): b = (1, 0, 0), 9 = 5°, u = (0, 0, 1). 



Chapter 6 



The Matching Procedure 



There are two important aspects to the problem of finding the set of point correspon- 
dences needed for computing motion. The first is that only a sparse set of very reliable 
matches distributed across the field of view is needed. The second is that, since there is no 
fixed relation between the relative positions of the corresponding points in the left and right 
images, as in the case when the epipolar geometry is known, the search must usually be 
conducted over a large area. Consequently, the matching procedure must have both a good 
detection rate as well as a very low false alarm rate to minimize the number of incorrect 
matches. 

In the first part of this chapter, I will derive the basic procedure which will be used and 
show how thresholds can be set to ensure adequate detection and false alarm rates. I will 
then present the results of applying the procedure to the edge maps from the sequences 
shown in the previous chapter. 

6.1 Finding the Best Match 

Following the usual procedure for block matching, we divide one image into N, possibly 
overlapping, M X M blocks. Since the edge maps are binary, we can simplify the equations 
by adopting the following notation. Let i, 1 < i < N denote the ith block and define 

P = total number of pixels in each block = M 2 

Bi = set of pixels corresponding to an edge in the ith block of the left, or base, 

edge map. 
Bi = the set complement of Bi within the ith block. 
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Sjk = set of edge pixels in the M X M block centered at coordinate location (j, k) 

in the right, or second, edge map. 
Sjk = the set complement of Sjk- 

When it is convenient, the lowercase variables s, s, b, b will be used to refer to individual 
pixels within the four sets. 

Using this notation, similarity measures can now be expressed by ordinary set operations. 
The normalized binary correlation function is given by 

\\Sjk n Bi\\ 

C l]k = " = (6.1) 

I Q II II R II 
|*Jjfc|| ■ ll-Dj'H 

and the absolute value of difference function by 

Vijk = j (\\s jk n Bi\\ + \\s jk n Bi\\) (6.2) 

which has been normalized so that < Vijk < 1- Unlike the case of gray level images, these 
measures are always equivalent for binary data since they are related by the equality 

PVijk = \\S jk \\ + \\Bi\\ - 2(\\S jk \\ ■ \\Bi\\)2C ijk (6.3) 

The absolute value of difference, Vijk, however, is simpler to compute and so is preferred 
over the correlation function. 

The most likely position of the match occurs at the minimum value of Vijk, V*. The 
decision to accept or reject the best match is based on the result of a comparison 

V* < r (6.4) 

Equation (6.4) is in the form of a binary hypothesis test that selects between the hypotheses 
Ho, that the match is false, and Hi, that the match is correct. The decision threshold r is 
chosen to achieve a given detection or false alarm rate. Although only V* must satisfy (6.4), 
any offset at which Vijk < T should be considered as a potential match. 

Formally, the detection rate is defined as the probability that Vijk will be below the 
threshold given that Hi is true. 
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Figure 6-1: Threshold selection for a decision rule that chooses between two hypotheses 
Hq and H\. 



Pd = ?*(V ijk < t\H x ) 



(6.5) 



while the false alarm rate is the probability that the test will be passed given that the match 
is incorrect [61]. 



Pf = Pr(V tlk <t\H ) 



(6.6) 



As indicated by the diagram in Figure 6-1, it is not necessary to explicitly compute Pp 
and Pp to determine r, but only to find the mean and variance of Vijk under the hypotheses 
Hq and H\. The Chebyshev inequality [87] ensures that 



P D = 1 - Pr(V t , k > r|#i) > 1 



VaxfVijklH^ 



and 



Pf = Pr(^ ?fc < t\H ) < 



k t - mi . 

VMV^ k \H ) 



(6.7) 



(6.8) 



(r- m ) 

To compute fj,fj we assume that the distribution of edge pixels within the two blocks 
are independent since they correspond to different features. We further assume that the 
values of each pixel within the same block are independent and identically distributed (i.i.d.) 
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Bernoulli random variables. Although the values of neighboring edge pixels are certainly 
correlated, the assumption of independence is reasonable over the entire block if it is large 
enough. Let pi denote the probability that any pixel in the block from the base image is a 
1, and let p s denote the same probability for any pixel in the block from the second image. 
Then 

fi Ho = E[Vij k \H ] = p s (l - p b ) + p b (l - p s ) (6.9) 

The variance a\j is found by rewriting (6.2) as 

Vijk = 4 (\\ s Jk\\ + \\Si\\ - 2\\S jk \\ n \\B t \\) (6.10) 



p 



Then 



ajj = Va,T[Vij k \H ] 
1 
P2 

4Cov(\\S jk \\, \\S jk \\ n 115,-H) - 4Cov(\\B jk l \\s jk \\ n ||5,-| 



-^ [Var (ll^-fcll) + Var (115,-H) + 4Var {\\S ]k \\ n ||5,-| 



= ~pl [ P Ps( l ~ Ps) + P Pb(l ~ Pb) + 4:Pp s p b (l - PsPb)- 

4Pp b p s (l - p s ) - 4Pp s p b (l - p b )] 

= p\Ps(l-Ps)+Pb(l--Pb)-4:P s p b (l-p s )(l-p b )] (6.11) 

When Hi is true, the distributions of edge pixels in the two blocks are no longer inde- 
pendent. Ideally, they should be identical, but due to the presence of noise, fj,^ will not be 
zero. If we assume a Bernoulli noise process, n, with probability p n such that 

s = bn + bn (6.12) 

Then 

b~s + bs = n (6.13) 

and hence 

fi Hl = E[V t]k \Hi] = Pn (6.14) 
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cxf^Var^l^) = Mi_M (6 . 15 ) 

One of the difficulties with using equation (6.9) to determine r is that it requires knowl- 
edge of both p s and pt,. These can be estimated from a local analysis of the two edge maps, 
however, doing so greatly complicates the procedure. A simpler test can be formulated by 
counting only the edge pixels in each of the N blocks from the base edge map. Since these 
blocks do not change, this needs to be done only once. 

Given ||-B;||, the distribution of Vijk under the null hypothesis changes slightly. We find 
that 

fi Hom \ = E[V ljk \H , \\B\\] = Ps (l- B-) + W(l - Ps ) (6.16) 



and 



P J P 



°h ,\\b\\ = Var[T^|tf , ||5||] = Ps{1 p Ps) (6.17) 



If we consider only blocks for which ||_B 8 ||/P < 1/2 then, since p s > 0, it is always the case 
that 



\B 
~P 

and hence a reasonable test can be formulated as 



V[Vij k \H ,\\B\\]>V-^- (6.18) 



\B\ 
~P~ 



V ijk < a V 1 ' (0 < a < 1) (6.19) 



subject to 

(iVn < W < I, (j3 > l) (6.20) 

where a is chosen to ensure a low false alarm rate and (3 determines the detection rate 
given a and assuming a value for p n . Equations (6.14)-(6.17), can be used in conjunction 
with equations (6.7) and (6.8) to set values for a and (3. In the simulations of the matching 
procedure which have been performed, typical values are a = 0.5 and (3p n = 0.15. 

Although this procedure categorically rejects any block in the base map which does not 
first satisfy (6.20), the gain in simplicity, which directly impacts circuit complexity, is worth 
the loss of a few correspondence points. The loss due to the upper bound of 1/2 in (6.20) 
should not be too great since, on average, edges cover much less than half of the image. The 
lower bound would be necessary under any circumstances to avoid trying to match blocks 
containing very few features. 
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6.2 Other Tests 

The previous analysis was based on the assumption that the pattern of edges in the block 
being matched was unique. When this assumption is true, the absolute value of difference 
function has a very sharp minimum at the location of the correct match. The variance of 
the function, as given by equation (6.17), will be very small if M is large enough since, as 
is easily shown, 

CT ffo,||B|| < ^ (6-21) 

If M = 24, for example, then crH ,\\B\\ < -021, It is not difficult to choose r so that (fJ,H ~ T ) 
is many times larger than au Q \\b\\- 

It is often the case, however, that the edge patterns are not unique. Repeating patterns 
occur frequently in natural scenes. The most common example is when the block contains 
linear segments that are part of larger entities, for instance the side of a door or of a table. 
Other examples occur with regular structures such as a set of drawers, or bookshelves. 

The only practical way to deal with the problem of repeating patterns without greatly 
increasing the complexity of the procedure is to simply throw out any matches which do 
not have a single well-localized minimum. Several of the matching procedures discussed 
in Chapter 3 impose figural or continuity constraints to disambiguate multiple responses. 
However, these methods operate in software and can therefore consider global information. 
A procedure implemented by specialized VLSI circuits can use only local information. 

Combining the test for a localized minimum with the restrictions (6.20) on the fraction 
of allowable base edge pixels and with the threshold test (6.19), only a relatively small 
percentage of the N blocks actually generate acceptable matches. It is therefore important 
to make N as large as is reasonably possible in order to ensure enough correspondence 
points are found to obtain a good estimate of the motion. It should be noted, however, 
that in spite of all these restrictions, there will still be errors that cannot be avoided. The 
purpose of the tests is to minimize the probability of these errors so that their effect on 
the motion estimates is minor. Additional steps can be taken, once a good estimate of 
the epipolar geometry has been obtained, to remove the remaining erroneous matches by 
identifying points which are significantly off the epipolar lines. 
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6.3 Simulations 

The matching procedure has been tested on dozens of different image sequences. The 
astronaut and lab image sequences, shown previously in Figures 5-6 and 5-7, were chosen 
to illustrate several of the issues which have been raised in this chapter. 

In applying the matching procedure to the astronaut sequence, the edge map from the 
left image was divided into 400 blocks whose centers were regularly spaced in a 20x20 array 
on the pixel grid. Each block measured 24x24 pixels, and the search in the right image 
was conducted over an area of 120x120 pixels surrounding the coordinates of the center 
of each block. The correspondences which were found are numbered sequentially and their 
locations are shown superimposed on the edge maps in Figure 6-2. The numbered locations 
of the correspondences are also displayed by themselves directly below the edge maps to 
aid the reader in finding them. 

Of the 400 blocks, 135 passed all the tests and generated acceptable matches. The 
quality of the matches which did pass the tests can be seen to be quite good. They are 
all correct to within possible offset error caused by approximating the correspondence at 
the center of the area covered by the block in the second image. A distinguishing feature 
of the astronaut images is the absence of repeating patterns. In fact the edge maps have 
almost the appearance of random dot images, and as a result, few blocks were rejected for 
not producing a well localized minimum. The vast majority of those which were rejected 
failed either the threshold test (6.19) or did not have the edge density required to pass the 
test of (6.20). 

The second sequence, composed of images taken in our laboratory, is a very different 
situation. Many of the objects in the scene, i.e., the bookshelves, workstation monitors, and 
tripods, have long linear features for which it is impossible to find the correct match with 
any certainty using a windowing method. There are also regular repeating patterns, such as 
the supports on the bookshelves and the drawer handles, which generate multiple candidate 
matches when more than one instance is included in the search window. In addition, the 
motion, which includes a z axis rotation, complicates things even more by introducing a 
relative tilt in the edge patterns. 

The matching procedure was executed on these images by dividing the left image into 
900 blocks (in a 30x30 array), each measuring 24x24 pixels. The search was conducted 
over an area of 200 (horizontal) X 60 (vertical) pixels. Of the 900 blocks, 49 produced 
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Figure 6-2: Binary edge maps of astronaut sequence with correspondence points found by 
the matching procedure. 
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Figure 6-3: Binary edge maps of real motion sequence (lab scene) with correspondence 
points found by the matching procedure. 
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acceptable matches according to the different tests in the procedure. These are shown in 
Figure 6-3. 

As in the astronaut sequence, most of these matches are very good. However, the 
proportion of blocks generating acceptable matches is much lower, and there are some 
obvious errors, such as points 13, 20, 32, and 46. The first three of these points are simply 
weak matches that passed the localization test only because they had a single minimum 
marginally below threshold, while the other minima were marginally above threshold. The 
last mismatched point, #46, demonstrates a different, but frequently encountered problem. 
This point lies near the border of the left image on the lower right corner of the workstation 
monitor which is not in the field of view in the second image. If the full monitor had been 
visible in the second image, the localization test would have rejected the match since there 
are several positions where a low score could be obtained. Instead, however, the wrong 
match was accepted. 

Lowering the detection threshold is not a good solution for removing the marginal cases 
which slip past the localization test. This has the effect only of reducing the total number of 
matches, without changing the fact that the threshold can still fall in between the minima as 
in the cases above. In fact, there is not a simple solution at this level for removing the bad 
matches which escape detection without compromising the generality of the procedure. In 
the next chapter we will see how these false matches affect the computed motion estimates. 



Chapter 7 



Solving the Motion Equations 



As previously discussed in Section 2.2, the basic procedure for computing general camera 
motion, or relative orientation, given a set of point correspondences {(r 8 -, £;)}, i = 1, . . . , N, 
is to find the rotation and baseline direction which minimize the sum of squared errors 

N 



S = 5>? (7.1) 



8 = 1 

where A 8 - is the triple product given in equation (2.34) as 

A,- = Bl t ■ ( Ti x b) (7.2) 

Since the measurements of the locations of the correspondence points are not always equally 
reliable, it is often appropriate to define S as the weighted sum 

N 

S = J2^1 (7-3) 

8 = 1 

where the weights {w 8 }, < W{ < 1, reflect the relative confidences in the data. 

The methods presented in Section 2.2 for minimizing (7.3) were developed to be exe- 
cuted on powerful digital computers where memory and power consumption limitations are 
not significant constraints. In this chapter a simplified algorithm which is much more suit- 
able for implementation on low-level hardware, such as a programmable microcontroller, is 
presented. The algorithm, which is based on an adaptation of Horn's second method [4], 
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is also an iterative nonlinear constrained minimization procedure. However, it breaks up 
the problem by alternating between updating the rotation and baseline, and in doing so, 
considerably reduces the size and complexity of the operations. The largest matrix which 
must be handled is 4x4, and the most complex operation at each iteration is solving a 3x3 
eigenvalue-eigenvector problem. 

It is not sufficient to present a method for computing camera motion without discussing 
problems of stability. There are several well known and analyzed cases in which the mini- 
mization problem is numerically unstable and which allow multiple solutions for the motion 
parameters. If we are going to build a robust system, we must be able to recognize and 
avoid these cases. In the next chapter, I will derive analytically the conditions for the 
function S to have more than one local minimum — even with (almost) perfect data — and 
will develop a test for determining when the solution found by the algorithm is indeed a 
reliable estimate of the true motion. In this chapter, I will merely introduce the subject of 
instability and multiple solutions and will present without proof the test for determining 
reliability. In the last section, I will present results which demonstrate both correct and 
incorrect convergence of the algorithm using the data from the astronaut and lab image 
sequences given in the preceeding chapters. 

7.1 The Simplified Algorithm 

In Horn's method, which was briefly discussed in Section 2.2, the rotation, represented 
by the unit quaternion q, and the baseline, represented indirectly by the quaternion d = bq, 
were updated simultaneously. This resulted in an 11x11 system of linear equations to be 
solved at each iteration in which three of the unknowns were the Lagrange multipliers from 
the constraint terms. 

If, however, the motion is a pure rotation, or if either the baseline or the rotation is 
known, the problem becomes much easier. The simplified algorithm is based on the fact that 
by alternately solving these easier subproblems, assuming the values for q and b from the 
previous iteration, the estimates of the motion parameters will converge to those obtained 
from the more complex method in which q and b are updated simultaneously. 

In this section, I will first present the procedures for solving the special cases and then 
combine these into a complete algorithm. Note that in the following derivations, the weights 
W-: are assumed to be constant. 
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7.1.1 Pure translation or known rotation 

The triple product A 8 - may be written as 

A,- = b • (£'i x r,-) (7.4) 

where £'i denotes the ith left ray rotated into the right coordinate system. Let 

c,- = £'i X r t - (7.5) 

The weighted sum of squares can then be expressed as 

N N 

S = ^w 8 A- = ^w 8 (b-c 8 ) 2 

8 = 1 8 = 1 

= bT ( X] w i c i c I ) b 

= b T Cb (7.6) 

If the rotation is given, or assumed, C may be treated as a constant matrix. It is a 
straightforward result from linear algebra that C, being the sum of the dyadic products CicJ , 
is symmetric and either positive definite, or at least, positive semi-definite. The unit vector 
b which minimizes S is the eigenvector of C corresponding to its smallest eigenvalue [3]. 

7.1.2 Rotation with known translation 

There is not an equivalent closed form solution, such as the one above, for finding the 
rotation with b given, or assumed. It is possible, nonetheless, to solve the minimization 
problem by means of a simple iterative procedure starting from an initial guess, Rq, for the 
rotation. At each iteration, k, we compute the incremental adjustment to the rotation, <*)R, 
such that 

Rfc+i = <*>R • Rfc (7-7) 

and 

S(K k+1 ) < S(K k ). (7.8) 

The procedure stops when the relative decrease in S is smaller than a given tolerance. 
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The incremental adjustment rotates each of the rays l'i through an additional angle 69 
about an axis r\. From Rodrigues' formula, equation (2.19), the new ray directions are given 

by 

6K£'i = l'i + smSeiij x l'i) + (1 - cosSO)^ x (77 X l'i) (7.9) 

which can be approximated to first order in 69 by 

6Kl' t « l'i + S0( : nx l'i) (7.10) 

The error at the end of iteration k is therefore 

X t)k+1 = b • (6Kl' t X r 8 ) 

= \ iik + 69((bXTi)x£'i)-ri (7.11) 

Let 

a 8 = (b X r 8 ) X l'i and m= -69 r) (7.12) 



so that 

\,k+l = A i,k 

The total error is then 

N 



A,;.fc+i = A,-fc - a?m (7.13) 



Sfc+i = ^ w 8 [\ 2 lk - 2X hk a^m + (a?m) 

i=l 

= 5 t -2h T m+m T Am (7.14) 



where we have defined 

N 

h = ^2wi\i tk a.i (7-15) 

i=l 

and 

N 

A = J2^^I (7-16) 

Except for pathological cases, i.e., when the field of view is zero, or N < 3, A will be 
invertible and positive definite. Accordingly, equation (7.14) posesses a unique minimum 
when 

m = A _1 h (7.17) 



CHAPTER 7. SOLVING THE MOTION EQUATIONS 93 

and hence the 69 and rj which minimize S to first order are given by 

tf0=||m|| (7.18) 

and 

f) = -m (7.19) 

In order to preserve orthonormality, <*)R should be computed exactly from 69 and r\ 
using Rodrigues' formula, equation (2.17), without approximation. Alternatively, one can 
maintain the rotation in unit quaternion form by computing 



<5<1 = (cos — , t) sin —\ (7.20) 

and 

q fc+ i = <5qq fc (7.21) 

The rules for transforming between unit quaternions and orthonormal matrices are given in 
Appendix A. 

7.1.3 Pure rotation (|b| = 0) 

If the motion is a pure rotation, the procedure just described will still work given an 
arbitrary value for b, but there is a simpler closed form method which can be applied. 
When |b| = we have, going back to the notation of equation (2.4) in Chapter 2, 

Prt = K Plt (7.22) 

By the length preserving property of rotations, 

IPr-il = \pu\ (7.23) 

If spherical projection (2.15) is used so that 

r t . = J^±, and, l { = ^- (7.24) 

IPril \Pti\ 
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it will also be true that 

r t - = R£ t = £'i (7.25) 

There are two possibilities for finding the rotation that best satisfies (7.25) in a least 
squares sense for the N correspondence points. The first is to define the error 

€i = rt X £'i (7.26) 

and minimize 

N 

s = J2 w ^ ( 7 - 27 ) 

8=1 

This formulation leads directly to a procedure similar to that defined previously for the case 
of known translation. 

The second method is to note that ideally 

r,- • l'i = 1 (7.28) 

so that we can also solve for the rotation which maximizes 

N 



S" = $>,-(r t - • *',-) I 7 - 29 ) 



8 = 1 



This formulation was previously used by Horn in an algorithm to compute absolute 
orientation 1 [88]. Writing (7.29) with the rotation expressed by unit quaternions we have 



N 



S' = ^2wi(Ti -qA-q*) 

8 = 1 

N 

= X>,-(f,-q • qi,-) (7.30) 

8 = 1 

This expression can be cast into a more convenient form by introducing quaternion matrices. 



J The difference between absofute and refative orientation is that in the former the distances to objects in 
the scene are known. Consequently one can vectoriaiiy subtract the transiation once it has been computed 
to arrive at the pure rotation case. Note that this cannot be done for refative orientation since absofute 
distances are not known. Hence this method is onfy appficabfe if in fact the motion is a pure rotation. 
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As shown in Appendix A, if a and b are two quaternions then 

ab = Ah = Bk (7.31) 

A is referred to as the left quaternion matrix associated with a, and B_ is referred to as the 
right quaternion matrix associated with b. Using (7.31) S' can be rewritten as 

N 
S' = ]>> 8 (7?-q) T £ 8 q 

8 = 1 

N 
= - £>,-q T M,-q (7.32) 

8 = 1 

where M 8 - = 7?. 8 \£ 8 . The minus sign arises from the fact that lZ i = —72.,-. We can remove 
q from the summation to obtain 

S' = 



-q T Ij^wMtjq 



= -q T Mq (7.33) 

using M to represent the sum in parentheses. 

S 1 is thus maximized by identifying q with the eigenvector of M corresponding to its 
most negative eigenvalue. Since M is a 4x4 matrix, there is in principle a closed form 
solution for q, although it may be simpler to obtain the result by a standard iterative 
procedure. 

7.1.4 The complete algorithm 

Combining the procedures for the special cases we can formulate an algorithm to solve 
for the general case of unknown translation and rotation as follows: 

Input: q (0) , b (0) , data 
if b = 

q = PuRE_RoTATE(data) 
else { 

k = 

^ (0) = Ef=i -8Af 0) 
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change = 1 

while (change > e) { 

q( fc +!) = UPDATE_Q(q( fc ),b( fc ),data) 

(b( fc+1 ),5*( fc+1 )) = UPDATE_B(q( fc+1 ),data) 

change = (S^ - S^ k+1 ^)/S^ 

k + + 



} 



} 



Pure_Rotate(), Update_b(), and Update_q() correspond to the procedures described 
in 7.1.1, 7.1.3, and 7.1.2, respectively. It is necessary to start the algorithm with initial 
values for b and q. These can be provided either externally or by obtaining q^ ' from 
Pure_Rotate() and b( ) from Update_b() with q set to (1,0,0,0). It is easily seen that 
the weighted sum of squares, S*- \ monotonically decreases with each iteration since it de- 
creases at each step. Since S is bounded below by zero, the algorithm will converge to some 
local minimum or stationary point. 



7.2 Ambiguities and Multiple Solutions 

There are four fundamental ambiguities associated with any pair (q, b) which minimize 
the weighted sum of squares S . Since the equations involve only quadratic forms, S is 
unchanged by multiplying either q or b by — 1. The solution — q is trivial since it corresponds 
to the same rotation as q. Changing the sign of b, however, reverses the direction of the 
baseline which also affects the sign of the Z coordinates computed for objects in the scene. 

A more subtle ambiguity occurs by imposing an additional rotation of it radians about 
the baseline which is equivalent to replacing q by d = bq. We previously derived in equa- 
tion (2.39) that 

A,- = f,-bq-qi t - (7.34) 

and it is easily verified from the identities of Appendix A that replacing q by bq results in 

f 8 'bbq • bqli = — f 8 q-bq^ 8 ' 
= -h-bi'i 
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= -A,- (7.35) 

and hence will give the same total squared error. 

For any solution, (q, b), therefore, (q, — b), (d,b), (d, — b), are also solutions. Each is 
derivable from the others, however, and only one should be feasible given the constraint 
that the imaged points are visible to both cameras. It is conventional therefore to count 
these four solutions as one [7]. 

If the data are error-free, a finite number of solutions exist to the non least-squares 
problem of solving A 8 - = for all N if there are at least five distinct ray pairs. Faugeras 
and Maybank [7] showed that in general there are 10 solutions to the five-point problem. 
If at least eight pairs are available, the solution is unique for most configurations of points. 
However, as first shown by Tsai and Huang [28] and Longuet-Higgins [8], there are con- 
figurations for which multiple solutions exist. Horn showed that only hyperboloids of one 
sheet and their degenerate forms viewed from a point on their surface could allow multiple 
interpretations [89], while Negahdaripour further demonstrated that only certain types of 
hyperboloids of one sheet and their degeneracies can result in an ambiguity, and in these 
cases, there are at most three possible solutions [40]. 

A more important concern for the present system is the fact that the function S may 
contain multiple local minima into which the algorithm can be trapped. The surfaces which 
give rise to multiple solutions of the equation S = are rarely encountered in practice, and 
are even less likely to arise by chance due to errors in the matching process. However, as will 
be demonstrated analytically in the next chapter, many environments can, in a statistical 
sense, have a depth distribution which mimics the effects of those of the special surfaces, 
and can thus generate multiple local minima. 

The conditions under which multiple solutions to the minimization problem most fre- 
quently arise are well known to be a function of the type of motion. Daniilidis and Nagel [6] 
derived analytically the conditions for instability in the case of pure translational motion 
or of translation with known rotation. They found that the extreme case occurs when the 
translation vector is parallel to the image plane and is accentuated as the field of view 
narrows. They as well as others (Spetsakis and Aloimonos [12], Horn [3], Weng et al. [90]) 
have proposed changing the error norm that is minimized to weight only the perpendicular 
distance of a point from its epipolar line in order to reduce the chance of convergence to an 
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alternate minimum. Horn derived the symmetric weighting term in [3] as 

Wl = ((b X r,-) • (r t - X l'i))Wal + ((b X £' { ) • (r t - X *',-)) 2 W£ (7 ' 36) 

where a Ti and oi t represent the variances of the right and left rays in the ith measurement, 
and <7q is an arbitrary constant which is included to maintain consistency in the units. 

One can gain a better understanding of equation (7.36) by going back to the rigid body 
motion equation which, even with imperfect data, must be approximately satisfied 

Prt « Kp u + b (7.37) 

Writing r 8 - = Z Tt p ri and £{ = Z^pn, we can see that 

b x r t - « Z u {vi X £'i), and, b x £' % « Z Ti {v { X £',) (7.38) 

and hence equation (7.36) can be written as 

Wi = 



4 

r t f|^f s in 2 a t - (z|.|r,-|V2 + Z?.\£^al 



f7.39) 



where a,- is the angle between r 8 - and £' {. 

For rays corresponding to points approaching infinity, Z Tt ~ Zi t and a,- -^ as 1/^, 
resulting in 

Wi — > t — r- (7.40) 

|r,f |4f (|r t f a?. + |^f CT |.) 

However, for points near the cameras, assuming their relative angle of rotation is < 90°, a; 
becomes larger as Z Tt and Zi t go to zero, so that in the limit, W{ — ► oo. 

Equation (7.36) thus correctly weights rays corresponding to points closer to the cameras 
more strongly than those corresponding to far away points. Unfortunately, it is necessary to 
know either the epipolar geometry or the Z coordinates of the matched points in advance in 
order to use this equation. If the W{ are computed from the current estimate of the baseline 
and rotation, then it is not possible to prove that the algorithm will converge to an unbiased 
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estimate of the correct solution. 

In an automated system for computing motion, it is more important to be able to identify 
when the algorithm has converged to the wrong stationary point than it is to try to ensure 
that it never does. The numerical instability of the algorithm in the case of translation 
parallel to the image plane is inherent and cannot be removed without prior knowledge of 
the motion. Nonetheless, we can often obtain useful estimates of the motion, even under 
these conditions, when it can be determined that the algorithm has converged to the local 
minimum closest to the true solution. In the next chapter, I will show that the most 
reliable indicator of correct convergence is the ratio ^2/^3, where ^2 and ^3 are the middle 
and largest eigenvalues of the matrix C defined in equation (7.6). This ratio theoretically 
depends on the orientation of b with respect to the vector V3 = Ri\ For translation parallel 
to the image plane, and therefore approximately perpendicular to £3, 1^2/1^3 is an increasing 
function of the field of view and will be <C 1 for most practical imaging systems. When the 
translation is parallel to £3, however, 1^2/1^3 ~ 1- We can thus determine if the algorithm 
has converged to the correct estimate by comparing the actual ratio to the one predicted 
from the values of b and R retuned by the algorithm. If the actual ratio is small compared 
to its predicted value, we can reject the solution as unreliable and proceed to the next set 
of images. 

Once it has been determined that the computed motion is reliable, a variety of techniques 
may be used to improve the estimates. For example, we can execute the algorithm several 
times to remove outliers, i.e., correspondence pairs for which |A 8 | is much greater than the 
average, or to apply weighting factors such as given by equation (7.36) using the previous 
estimates of b and q. 

7.3 Simulations 

The results of applying the simplified algorithm to the astronaut and lab sequences pre- 
viously seen in Figures 5-6 and 5-7 are given in Tables 7.1-7.3. In the astronaut sequence, 
which was generated by software, the motion with respect to the origin of the camera 
coordinate system is known exactly, while for the lab sequence, it is only known approxi- 
mately. Both sequences, however, correspond approximately to the classically unstable case 
of translation perpendicular to Ri\ 

The correspondence points used to compute the motion for the astronaut images are 
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b 


6 


UJ 


s 


^2/^3 


actual pred. 


True motion 


(1.0,0.0,0.0) 


5° 


(0.0,1.0,0.0) 




| 


First Test. Initial values from, estimates of pure translation and rotation 


Initial values: 


(.9982, -.0129, -.0591) 


9.72° 


(-.0045,. 9998, -.0200) 


7.7e-4 






First pass 


(.9998, -.0164, -.0120) 


4.95° 


(-.0132,. 9999, -.0011) 


6.0e-6 


.418 


.475 


Second pass 


(.9999, -.0133, -.0083) 


4.99° 


(-.0106, .9999, 0.00) 


3.4e-6 


.441 


.475 


Second Test. Initial values chosen to result in alternate local minimum 


Initial values: 


(0.0,0.0,1.0) 


5° 


(0.0,1.0,0.0) 


4.9e-3 






First pass 


(.0481,. 0093, .9988) 


10.66° 


(-.0016, 1.0000, -.0036) 


7.1e-6 


.586 


.979 


Second pass 


(.0403, -.0040,. 9992) 


10.63° 


(-.0012, 1.0000, -.0004) 


3.1e-7 


.471 


.977 



Table 7.1: Simulation results on the astronaut sequence. 



those shown in Figure 6-2 which were generated by the matching procedure. As was previ- 
ously noted, the quality of these matches is very good, and this is reflected in the closeness 
of the calculated and the true motion. Table 7.1 shows the results of two tests. In the first, 
initial values were chosen by computing the best pure translation and best pure rotation 
which fit the data. In order to improve the estimates, two passes of the algorithm were 
performed, with the first to obtain an initial estimate and identify outliers to be removed. 
As can be seen, there is very little difference in the solutions computed in the two passes 
for the astronaut sequence. Of the 135 correspondence points, only 19 were found to have 
an error greater than one standard deviation above the mean. The reliability of the results 
is also evidenced by the closeness of the actual and predicted values for the ratio ^2/^3, 
given that the field of view for this simulated sequence was set at ~ 55°. 

TDespite the fact that the field of view is relatively large and there are many corre- 
spondence points, it is nonetheless possible to make the algorithm converge to another local 
minimum. Starting the algorithm with an initial value of b = z, shown as the second test 
in Table 7.1, the result is quite far from the known solution. Removing outliers from the 
data does not prevent the algorithm from converging to this incorrect result, and nor does 
any other weighting scheme. It is interesting to note that the value of S = S/N is of no use 
in discriminating between the correct and incorrect solutions since it is very small in both 
cases. The actual and predicted values of ^2/^3, however, do show the difference. In the 
second test they are .586 and .979 in the first pass and .471 and .977 in the second. The 
fact that the predicted ratio is significantly different from the actual value indicates that 
these results are unreliable. 



CHAPTER 7. SOLVING THE MOTION EQUATIONS 



101 





b 


6 


UJ 


S 


^2/^3 


actual pred. 


Motion of stage 


(1.0,0.0,0.0) 


5° 


(0.0,0.0,-1.0) 




| 


First Test. Initial values from, estimates of pure translation and rotation 


Initial values: 


(.5867, -.0878, .8050) 


7.43° 


(.0278, .8102, -.5855) 


7.06e-5 






First pass 


(.9937,. 0007, .1120) 


5.04° 


(.1286, -.3815, -.9154) 


4.09e-5 


.027 


.084 


Second pass, 
(rejects: 20, 39, 43) 


(.0247, -.1316, .9910) 


7.86° 


(.0387,. 8559, -.5156) 


1.84e-5 


.045 


.776 


Second Test. Initial values chosen to converge to estimate of correct motion 


Initial values: 


(.9990,. 0316, .0316) 


7.43° 


(.0278, .8102, -.5855) 


1.21e-4 






First pass 


(.9961, -.0543, .0703) 


5.12° 


(.0207, -.3750, -.9268) 


3.41e-5 


.027 


.083 


Second pass, 
(rejects: 39, 43) 


(.9972, -.0513, .0552) 


5.88° 


(.0137, -.5309, -.8474) 


3.91e-6 


.026 


.083 


Third Test. Same initial valves as 2nd test, but points 13, 20, 32, ^6 removed by hand. 


Initial values: 


(.9990,. 0316, .0316) 


7.49° 


(.0274,. 8184, -.5740) 


1.02e-4 






First pass 


(.9940, -.0085, .1095) 


4.84° 


(.1126, -.2912, -.9500) 


3.67e-5 


.028 


.084 


Second pass, 
(rejects: 39, 43) 


(.9961, -.0056, .0885) 


5.74° 


(.1026, -.5122, -.8527) 


4.09e-6 


.026 


.083 



Table 7.2: Simulation results on the lab sequence with points from automatic matching 
procedure. 





b 


6 


UJ 


S 


M2/V3 


actual pred. 


Motion of stage 


(1.0,0.0,0.0) 


5° 


(0.0,0.0,-1.0) 




| 


First Test. Initial values froi 


n estimates of pure translation and 


rotation 




Initial values: 


(.9915, -.0656, -.1126) 


7.69° 


(-.0019, .8193, -.5733) 


1 _•■ ■• - 1 






First pass 


(.9964, -.0808,. 0262) 


6.07° 


(-.0484, -.5737, -.8176) 


2.64e-6 


.067 


.087 


Second pass, 
(rejects: 16, 17, 20) 


(.9964, -.0802,. 0287) 


6.0° 


(-.0453, -.5591, -.8278) 


1.55e-6 


.073 


.083 



Table 7.3: Simulation results on the lab sequence with hand-picked correspondence points. 



In the lab sequence, the exact motion with respect to the camera coordinate system is 
unknown because the system was not accurately calibrated. In order to evaluate the quality 
of the correspondences obtained by the matching procedure against a known standard, we 
compare the results for the automatic data with those from a second set of correspondence 
points chosen by hand. To estimate the motion for both sets of data, we use the follow- 
ing approximate internal calibration matrix, derived from the manufacturer's data on the 
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camera and lens used with these images, 



K r 



( 567 0.0 378 \ 

0.0 484 242 
y 0.0 0.0 1.0 J 



f7.41) 



to transform between image plane and world coordinates, as described in Section 2.1.1. 

The correspondences obtained by the matching procedure are shown in Figure 6-3. It 
was previously observed that four of the points, specifically 13, 20, 32, and 46, were clearly 
wrong, while it was less obvious if other points were also in error. Table 7.2 lists the results of 
three tests conducted on the data from the matching procedure. In the first, the estimates of 
pure translation and pure rotation were used as starting values for the algorithm. Although 
the initial value for b: (.5867, —.0878, .8050) is quite far from the approximate translation 
direction, the algorithm does converge to a reasonably close solution on the first pass. 
On the second pass, however, it falls into an alternate stationary point — indicating that 
removing outliers on a heuristic basis does not always give a better estimate. The actual 
and predicted values of ^2/ 1^3 once again point out the difference in the two results. In the 
first we have .027 and .084 for the actual and predicted ratios, computed for an effective 
field of view of 30°, while in the second we have .045 and .776, indicating an unreliable 
result. 

In the second test, a starting value of b = (.9990, .0316, .0316) was used, and this time 
the algorithm converged to a reliable solution on both passes. Interestingly, none of the 
four clearly incorrect matches was rejected after the first pass, although points 39 and 43, 
which are not so obviously wrong, were rejected. The actual and predicted ratios of 1^2/1^3 
of (.027, .083) on the first pass, and (.026, .083) on the second, are very close and indicate 
that the solutions are reliable. Although the errors for points 13, 20, 32, and 46 are quite 
noticeable, careful examination reveals that they do not have a large component in the 
direction perpendicular to the correct epipolar line, while points 39 and 43, which were 
rejected, do. In the third test, we verify directly that these four points have little effect on 
the computed motion by manually removing them from the data set. As seen in Table 7.2, 
the results of this test are almost identical to those of the previous one. 

The manually chosen correspondence points for this sequence are shown in Figure 7- 
1. There are 28 points in all which were selected using a high resolution display and a 
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Figure 7-1: Binary edge maps of real motion sequence with hand picked point matches. 



mouse-driven pointer. The same test, in which the motion was computed using the initial 
values derived from the estimates of pure translation and rotation, that was performed 
on the points found by the matching procedure was performed on these data with the 
results given in Table 7.3. This time the initial estimate of the translation was much 
closer to the actual value and the algorithm converged to the correct estimate on both 
passes. As can be seen by comparing Tables 7.3 and 7.2, the estimates computed for the 
manually and the automatically chosen points are very close: b = (.9964, —.0802, .0287), 
= 6.0°, Q = (-.0453, -.5591, -.8278) for the manual data vs. b = (.9972, -.0513, .0552), 
= 5.88°, Q = (.0137, -.5309, -.8474) and b = (.9961, -.0056, .0885), = 5.74°, Q> = 
(.1026, —.5122, —.8527) in the second and third tests with the automatic data. 

Based on the results from both the astronaut and lab sequences, we can thus conclude 
that the data obtained by the matching procedure are of comparable quality, at least with 
respect to estimating motion, to those obtained by more elaborate methods. 



Chapter 8 



The Effects of Measurement Errors 



Understanding the effects of errors in the data on the estimated motion is criticai in 
designing an automated system for tasks that demand high reliability, such as navigating an 
autonomous vehicle. Previous studies on the effects of error, however, have been incomplete 
and would lead one to believe that any system built from current technologies would at best 
give poor results. 

In this chapter, I will analyze in detail the numerical stability of the motion algorithm, 
which affects its sensitivity to error, and derive analytic expressions for the expected esti- 
mation error in the case of both random and systematic errors in the data. I will also derive 
the conditions under which the algorithm will converge to an alternate local minimum and 
develop the theoretical basis of the ratio test to determine if the solution reported by the 
algorithm is reliable. In the last section, I will compare the theoretical predictions of the 
first part of the chapter with the results from simulations on data from artificially generated 
motion sequences with varying amounts of added error. 

Several important results are obtained in this analysis. The first, of course, is the 
development of the ratio test to determine reliability. Just as important, however, is the 
fact that the error analysis provides us with guidelines for designing a system with the 
required sensor resolution and field of view, as well as with the required number and size 
of matching circuits, to obtain a given maximum expected error in the estimated motion. 
Finally, I also derive an interesting practical result which is that precise internal camera 
calibration is not necessary in order to obtain accurate estimates of the translation direction, 
as long as the rotation is estimated as well. 
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Figure 8-1: Geometry of the position vector for a point on the image plane and cone of 
all vectors for a given field of view. 



8.1 Numerical Stability 

In order to understand the problems of numerical instability, we must have analytical 
forms for the matrices C and A which are used in the procedures to update b and q. 
Assuming the correspondence points are distributed uniformly over the left and right images, 
we can approximate the summation of equation (7.3) by an integration over the field of view. 



N 



s = J2 w * x l 



8 = 1 



N 



D r 2ir 



JO 



w(£, a) A 2 £<i£ da 



(8.1) 



We assume that the correspondence points in the left image are contained within a circle 
of radius D centered about the point (0, 0), as shown in Figure 8-1 and use the homogeneous 
form (2.14) of representing ray directions. Fhe vector £ from the center of projection to a 
point ((cosa, £sina) on the image plane is thus 



' 4 cos a 
^ sin a 



V 



1 



(8.2) 



/ 



where £ £ [0,-D] and a £ [0,27r). Since we have implicitly set / = 1, the viewing angle (f> 
is computed as 

4> = t^~ 1 D (8.3) 
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The factor N/irD 2 in equation (8.1) takes into account the number of correspondence 
points by representing them as a uniform density over the viewing field. It is not appropriate 
to make the assumption that N scales proportionally to the viewing area, that is as D 2 , 
because the number of pixels on the sensor, which ultimately limits N, is constant. If 
we change the optics on the imaging system to give a wider field of view, the number of 
correspondences will not increase significantly, the points will simply be distributed over a 
larger area. 

In this section I will derive the analytical forms for C and A in the case where the data 
are error-free, as well as the conditions for S to have multiple local minima corresponding 
to feasible solutions. In the next section I will analyze the case of imperfect data and study 
the effects of error on the reliability of the motion estimates. 

8.1.1 Eigenvalues and eigenvectors of C 

In section 7.1.1, we defined the matrix C as 

N 

c = J2 w * c * c ! ( 8 - 4 ) 

8 = 1 

N 
= £>,-(*',■ xr,-)(*',-Xr,-) T (8.5) 

8 = 1 

where £'i = R£ 8 . 

In this section, we assume that the data are error-free so that the rigid body motion 
equation 

Pr = Rp, + b (8.6) 

previously seen as equation (2.4) in Chapter 2, holds exactly. 
We define r 8 - and £j as the homogeneous vectors 

r 8 = Pn = -^-Pr % and, li = p it = —pn (8.7) 

so that 

Z rt r t = Z f J' t + b (8.8) 
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Taking the cross product of both sides with £'i we thus have 



t'i X r t - = — (£'i X b) 



Since there is no error in the data, we set W{ = 1 and write C as 



N 



(8.9) 



c = YX £ 'i x r «-)(*'.- x r «" 



1 



where 



8 = 1 

N 

. Zj 

8 = 1 ' ' 

\8 = 1 r l I 



l ~b Z by ^ 



(8.10) 



(8.11) 



(8.12) 



B, 



(8.13) 



b z -6^ 
\ -by b x y 

We now make the approximation that the correspondences are distributed uniformly 
over the image and replace the summation of equation (8.12) by the integral 



C = -B, 



(^ff ^"' Tw °) B = 



(8.14) 



The distribution of depths, Z r , of points in the scene is of course unknown. However, 
given that we are interested only in analyzing the general structure of the matrix C for 
different types of motion, we can consider Z r as a random variable whose probability distri- 
bution is independent of £ and a and can therefore replace the 1/Z% term by its expected 
value and take it outside the integral. We then have 



C = -kB, 



(^ff " ,Tf H B> 



where k has been defined as 



Zl 



(8.15) 



(8.16) 



This integral is now in the form of equation (B.3) whose solution is given in Appendix B, 
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equation (B.20). The result is 

C = - K NB x (^(l-v 3 v 3 T )+v 3 V3 T }B x (8.17) 

/ n2 

= kN — (i - bb T - (b X v 3 ){b X v 3 ) T 

+ (b x v 3 )(b x v 3 f) (8.18) 

where v 3 = Hz represents the rotation of the optical axis in the left camera system. 

By inspection, we can see that both b and b X v 3 are eigenvectors of (8.18) with eigen- 
values ii-fo = and 

Vbxh = ^(^(l-|bx^| 2 ) + |bx^ 3 | 2 ) (8.19) 

= K N (^-(b ■ v 3 f + \b X v 3 \ 2 ) (8.20) 

Consequently (b X v 3 ) X b is also an eigenvector with eigenvalue 

kND 2 
^(bx£ 3 )xb = ~^— (8-21) 

Only the eigenvalue of b X v 3 depends on the motion. Its extreme values are obtained 
when b _L v 3 and b || v 3 . If b _L v 3 we have 

Vbxv 3 = kN ( 8 - 22 ) 

For D < 2, or a viewing angle (f> < 63.4°, this will be the largest eigenvalue, and the ratio 
of the second largest to the largest eigenvalues will be 

M2 D 2 . 

— = (8.23) 

Ms 4 

The numerical stability of determining the translation with known rotation is related 
to this ratio. If it is small compared to zero — which is the ratio of the smallest and largest 
eigenvalues — adding error to the data can cause the two smallest eigenvalues to switch 
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places. In most practical situations v% will be very close to z. From equation (2.16), V3, 
which is the third column of the rotation matrix R, is given by 



' ui x ui z (l — cos 9) + uiy sin 9 » 



V3 



(8.24) 



Lo y uj z (l — cos 9) — lo x sin 9 
\ cos9 + ui'i(l - cos9) I 

If = or Q> = z, V3 is identically equal to z. If Q> ^ z and 9 is so large that the 
approximation cos 9 ~ 1 is not valid, the two cameras will not image the same scene. 
The unstable case thus usually occurs when the motion is (nearly) parallel to the image 
plane and the field of view is small, as reported by Daniilidis and Nagel [6]. The eigenvector 
corresponding to the second largest eigenvalue is (b X £3) X b = £3. If the standard deviation 
of the error in the data is greater than the difference between the two smallest eigenvalues, 
we may find that the procedure Update_b() reports a translation direction which is close 
to z instead of to x. This is the instablility which is most commonly observed in practice. 
The other extreme case corresponds to b = £3, or motion (nearly) parallel to the optical 
axis. In this case, v% is an eigenvector and so is any vector perpendicular to £3, since the 
direction of b X V3 is undefined. The two largest eigenvalues are equal 

kND 2 
M2 = Ms = —^ (8.25) 

and thus the estimation of the translation direction is numerically stable, independently of 
the field of view. 

8.1.2 Condition of the matrix A 

The condition number of the matrix A, Ka, defined as the ratio of its largest and 
smallest eigenvalues, is a measure of its nearness to singularity. Since the incremental 
update to the rotation requires inverting A, Ka is the critical parameter in determining 
the numerical stability of this procedure. 

In equation (7.16) we defined A as 

N 
A = J2w t a t aJ (8.26) 

8 = 1 
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where 

a,- = (b X r,-) X £'i (8.27) 

As before, we assume that the data are error-free so that W{ = 1 and we can write 



Z ri ri = Z^l'i + b 



Taking the cross product of both sides with b we have 

bxr, = -^(bx l'i 

Zj t 



(8.28) 



(8.29) 



and hence 



Z Tt 

Zj t 



b x l'i) x t 



■h-l'i)l'i-\l'i\ 2 b 



(8.30) 
(8.31) 



Substituting the above expression for a 8 into equation (8.26), we obtain 
N f 7 \ 2 

A = T,(y L ) [(h-e'i^i'ii'J - \£'i\ 2 (b ■ l'i) (hi'J +£' 8 b T ) + i^fbb 1 



8 = 1 



(8.32) 



If the distances to objects in the scene are large compared to the baseline length, the 
terms (Z£ t /Z rt ) 2 should be close to 1. In any case, we may assume they are random variables 
which are independent of position and which may therefore be replaced by their expected 
value. Let 



7 = E 



^ 2 

Zj t 



(8.33) 



Then, making the approximation that the correspondences are distributed uniformly over 
the image, equation (8.32) becomes 



Each term on the right-hand side of this expression corresponds to one of the special 



CHAPTER 8. THE EFFECTS OF MEASUREMENT ERRORS 



111 



integrals computed in Appendix B. The solutions are as follows: 



/ / (b-£') 2 £'£' T £d£da 
Jo Jo 



irD 4 



El 
6 



irD^b-hYhh 1 1 



1 2ww T + |w| 2 (I - v 3 v 3 T 
' 3D 2 \ 



2bb T + I 



C8.35) 



where 

For the second term 

rD r 2ir 



w = (v 3 X b) X v 3 = (I - v 3 h ) b 



C8.36) 



D /•27T 



f r \£'\ 2 {b • £') (b£' T + £'b T ) £d£da= f f* \£'\ 2 (bb T £'£' T + £'£' T bb T ) £ d£ 



da 



irD 2 



2 3 

and for the last term: 



D 2 D A \ T / D 2 D A \ / T T 

— + — bb T + 1 + (b • v 3 ) hv 3 T + v 3 b T 



4 6 



(8.37) 



l-D f2lT ( 

/ / \£'\ 4 bb T £d£ da = ttD 2 I l + D 



' ■ *) bbT 



C8.38) 



Combining (8.35), (8.37), and (8.38) and skipping much of the messy algebra, we obtain 



7 A 



D 2 D 4 



D 2 D 2 



D< 



1 - — - — ww 1 + __+i-_(b- hY I 



D 



4 12 
D 2 



4 \ 6 



6 



4 V 1 ~ ~6~j (1 -SCb-^) 2 )^^" + . 4 



T,/3^,^ bb T 



Since 



w • (b X v 3 ) = v 3 • (b X v 3 ) = b • (b X v 3 ) = 
(b X v 3 ) is clearly an eigenvector of A with eigenvalue 



(8.39) 



(8.40) 



^b> 



v 3 



1 ND 2 (D 2 
4 I 6 



l-^)(b-3) 



(8.41) 
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The other two eigenvectors are therefore linear combinations of b and V3 but do not 
coincide exactly with either b or V3 unless b _L V3 or b || V3. 
If b _L £3, then w = b and A becomes 



A = 7 A 



1 



El 

2 



12 



bb J 



24 



D 2 


(,- 


D 2 


4 


I 


6 



W3W3- 



(8.42) 



The eigenvectors and eigenvalues are then, in increasing order, 

b X t>3, with [ifo — 

V3, with fj, 



v 3 



and 



b, with fifo = 7 A 1 



v 3 24 


jND 2 


4 


D 2 D 4 



(8.43) 
(8.44) 

(8.45) 



Note that this order holds for D < t/G, or for a viewing angle <f> < 67.8°, at which point 



^b> 



v 3 



fix-. . The condition number is given by 



"v 3 



Kt 



Mb 



Mb> 



V3 



24 / D^ D^ 
D 4 \ 1+ 2 + 8 



(8.46) 



As _D — ► 0, A^4j_ - ► °°7 as we should expect. This reflects the well-known phenomenon, 
which occurs with small fields of view, of interference between the displacement patterns 
caused by a rotational motion and those caused by a translation parallel to the image 
plane. As D — ► 00, Ka l decreases and eventually goes to zero, however for values of 
D which exist in real imaging systems, it remains quite large. For example, if (f> < 60° 
(D 2 < 3), Ka l > 8.67. Hence the estimate of the rotation will always be very sensitive to 
error when b • V3 = 0. 

If b = V3, w = 0, and A becomes 



jND 2 



v 3 v 3 



4D 2 „ „ T 



(8.47) 
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v% is an eigenvector and so is any vector perpendicular to £3. The eigenvalues are 

H 3 = J -^— ( 8 - 48 ) 



and 



~/ND 2 
Mi = L ~ r - (8-49) 



When D 2 = 3/4, or <f> = 36.9°, the eigenvalues are equal. For D < ^3/4 

A '"> = J ^ 3 = wi ,8 ' 50) 



while forD > y/5/4 



^v 3 AD' 



Ka ' = TI = — (8 ' 51) 

Kau thus has a minimum at D = ^3/4 and goes to infinity both as D —> and as 
D — ► 00. For 60° > (f> > 23.4°, however, ii'^,, < 4, and so for viewing fields used in most 
real imaging systems, the estimation of the rotation will be robust. 

8.1.3 Minimizers of S 

Using the results derived in this section, we can now determine the conditions for S to 
have more than one stationary point corresponding to a feasible, non-trivial solution. As 
long as _D / and the motion is not a pure rotation, in which case C = 0, the smallest 
eigenvalue of C given the true rotation is unique, and hence so is the solution for the baseline. 
Let bo and Ro denote the baseline and rotation corresponding to the actual motion. If an 
alternate minimum of S exists corresponding to the solutions b' and R', it must be the case 
that 

R' ^ R (8.52) 

C'b' = fi'W (8.53) 

and 

h' = (8.54) 
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where // is the smallest eigenvalue of C' and 

N 

C = £(*",- X Ti)(£"i X r t -) T (8.55) 

8 = 1 

with £"i = TL'£ t . 

We can always write R' as R' = <*)R • Ro so that £"i = S~FL£'i. If <*)R corresponds to an 
incremental rotation of 80 about an axis t), then we can use the small angle approximation 
to Rodrigues' formula to obtain 

e"i « e'i + seivxe'i) 

= £' t +£' t xm (8.56) 

where m = — 80rj. We can thus write 

A;- = A^o + SX'i, and a(- = a^ + ^. (8.57) 

where 

A^ = b'-(£'iXTi) (8.58) 

a ;- = (b'xr.jxf, (8.59) 

8\' % = b' • ((£'i X m) X r,-) 

S&'i = (W X r,-) X (£'i X m) (8.60) 

Noting that 

8\\ = _((b' X r,-) X £'i) ■ m = -a;- T m (8.61) 

the vector h' thus becomes 

N 



8=1 

N 

= J2 ( A 'o - a 'o Tm ) ( a 'o + K') 



8 = 1 

N N N 

J2 A 'o a 'o - J2 a 'o a 'o Tm + J2 ( A 'o - a 'o Tm ) ^ a ' (8-62) 
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Let S* = fj,' denote the value of S given b' and R', and let Co denote the matrix C 
computed using Ro. From equation (7.14) we then have 

N 
S* = J2 ( A 'o 2 - 2A 'o a 'o T m + m T aJ a< T m) 

8 = 1 

= b /T Cob'-2h T m+m T A m (8.63) 

where 

N N 

h o = J2 A 'o a 'o 5 and ; A o = XXo a 'o (8-64) 

8=1 8=1 

If S* is indeed a local minimum, then from equation (7.17), it must also be the case 
that 

h T = A m (8.65) 



giving 



as well as 



S* = b /T C b' - m T A'm (8.66) 



Defining 



N 
h' = E ( A 'o " *8-o T m) ^ (8.67) 

8 = 1 

N N 

Sh' = J2 KM, and, 6 A! = £ Sa.'^ (8.68) 

8=1 8=1 

we see that the necessary condition, h' = 0, for S to have a stationary point can thus be 
stated as 

m = (^A')" 1 ^h / (8.69) 

We can now show at least one case, which occurs when bo -L t>3, where we know an 

alternate solution exists. Since 

Z t - „ 1 



we can write a' as 



rv = -£-l'i + ^b (8.70) 

Zj t - Zj t - 



a-o = f^(b' X l'i) X l'i - ^u X I', (8.71) 

Zj t - Zj t - 
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where u = bo X b'. Now let b' = v 3 and m = a u, where a is some constant, so that 



S* = v 3 T C v 3 - a 2 u T HT (1^-) 2 ((v 3 x £'i) X l'i)((v 3 x *',-) X £' 8 ) T J u (8.72) 



The summation in the second term of this equation is identical in form to the one in 
equation (8.32) whose analytic expression, derived by approximating the sum as an integral, 
is given on the right-hand side of equation (8.39). Upon substituting v 3 for b and using the 
facts that w' = (v 3 X b') X v 3 = and t> 3 • bo = 0, we obtain 



u t fj2 (f 1 ) 2 ((b' x l'i) x £'i)((b' x l'i) x I'iA u 
Since V3 is an eigenvector of Co, when v% _L bo, with eigenvalue 



jND 2 



f8.73) 



kND 2 
H 3 = —^ ( 8 - 74 ) 

we thus have 

S* = ^( K -a 2 7 ) (8.75) 

If a 2 = K/7, S* will be identically zero, and thus will clearly be a minimum. However, 
we should keep in mind that the constants k and 7 are only approximations to values that 
would be obtained if the Z coordinates of the points in the scene were known exactly, and 
hence we cannot use equation (8.75) to find a by setting S* = 0. To show that the solution 
b' = V3, m = a (bo X V3) minimizes S, we need to show that it satisfies the necessary 
conditions (8.53) and (8.54). 

We first check that h' = by expanding each of the terms in equation (8.67), applying 
the rigid body motion equation (8.70) and the conditions bo • V3 = and m = a bo X £3, to 
give 

AJo = ^3 • (£'i X r,-) = l-L'i ■ (b X v 3 ) (8.76) 

Zj t 



i' T m = a^-((v 3 X £'i) X £'i) ■ (b X v 3 ) 

= a^(v 3 ■£' {)£'<■ (b xv 3 ) (8.77) 

Zj t 
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and 



<Sa- = a (v 3 X r;) X {£' % X (b X v 3 )) 
Z 

''Z, 



7 1 

a^-{h X l'i) X (£' 8 X (b X v 3 )) + a—(v 3 x b ) X (£' t X (b X v 3 )) 

Zj t 



-a 



^-(e'i ■ b )e'i + -^-{t'i ■ (b x f> 3 ))(bo x v 3 ; 

Zj t - Zj t - 



(8.78) 



We can thus write 



N 



z 



H ' = "« E -W i 1 ~ <* Z *i (*3 • *'.■)) ((b X t> 3 ) • Wit? b 



. 1 zi 

8 = 1 '! 



JV 



1 



- a E^r( 1 - «^(V *',■)) 0>o x ^) T £ , 8 £' 8 i (b x $3) (bo x v 3 ) (8.79) 

and then replace the sums by integrals, assuming the rays are uniformly distributed, to 
obtain an analytic expression. Before writing the solution, however, we first use a result 
shown at the ends of Sections B.5 and B.6, that substituting (v 3 • £'i) = 1, in the above 
equation does not affect the value of either integral. We also note from equation (B.28) 
that the integral 

i i V ((b X v 3 ) ■ £')£'£' T b £ d£ da — ► (8.80) 

Jo Jo 

We thus have 



h' 



^/>//>/T, 



/ x JV f D f 27T 

-aK\l- aZij —ryi / (b X v 3 ) 1 £'£"-(b X v 3 )(b Xv 3 )£d£da 
-an—— (l - a~zf) (b X v 3 ) (8.81) 



where Zi = E[Z(\. 

The condition h = can be thus be satisfied by setting a = \jZi. Again, this is a 
convenient approximation, however, it does not change the fact that we can find a constant 
a which satisfies h = by setting 



.JV 



Tfi'.O'Tf 



a 



Ef=i l/^(b x hytlil't {ho x v 3 



(8.82) 



Eili ZijZl(v 3 ■ £'i)(bo x v 3 ) T £'i£'J(b x v 3 ) 
The second necessary condition for a minimum is that b' = v 3 must be the eigenvector 
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of C' corresponding to its smallest eigenvalue. We write C' as 

N 

i=l 

N 

= XX Cj0 + Sc i)( c iO + !>Ci) T 
i=l 

N N 

= C + J2 ( c «-o*c,- T + Scicty + J2 Sci6ci T (8.83) 

i=l i=l 

where 

Cj'O = *- i X r 8 ' 

tfc t - = a(£'i X (b X v 3 )) X r t - (8.84) 

Using the rigid body motion equation (8.70), we expand c 8 o as 



1 

Zj t 



c,-o = -— B xo *' t - (8.85) 



where B xo is the cross-product matrix corresponding to the operation t>oX, and in a similar 
manner write 8ci as 

Sa = ^- [Z ti (\£\\ 2 I - £\£'J) + (b • l'i) i] (b X h) (8.86) 

We thus have 

c,-o«c,- T = - JrB X0 [(^|£' 8 | 2 £' 8 + £'^b ) (b x v 3 ) T - Z it (£' t ■ (b x v 3 ))£' t £'J] 

(8.87) 
and 

2 

tfc,-tfc,- T = |^ [(zf. \£' t \ 4 + 2Z it \£' t \ 2 £'jb + b^£' t £'jb ) (b x v 3 )(b x v 3 ) T 

- Z\\£' % \ 2 {£',£'1 (b x v 3 )(b x v 3 ) T + (b x v 3 )(b x v 3 ) T £' t £'J) 

- Z ti {£\ ■ (b X v 3 )) (V^bo(b X v 3 ) T + (b X h)^ £',£']) 

+ Z 2 (£' t -(b xv 3 )) 2 £' t £'J] (8.88) 
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The above equations are written so that each term can be identified with one of the spe- 
cial integrals solved in Appendix B. Upon approximating the summations in equation (8.83) 
as integrals and using the fact that bo • v 3 = 0, the solutions are found to be 



N 

8 = 1 



c,; -|- dCjCj'o 



and, 



-2anZ(N 



D 2 \ D 2 

1 + — J (b X v 3 )(h X v 3 f + —v 3 V3 T 



(8.89) 



N 



Y^ f> c rf c 



8 = 1 



a 2 N 



( D 2 D 4 ' 

n i+ - + i2 



kD 2 



(b X v 3 ){h X v 3 ) 



,D 4 



D 2 



D 2 



+ "IN—I + a^/N— I 1 - — l v 3 v 3 ' 



(8.90) 



It is now clear that C' has the same eigenvectors as Co, namely bo, v 3 , and bo X v 3 . 
From equations (8.21), (8.20), (8.83), (8.89), and (8.89), the eigenvalues are given by 



a 



' -0 



(8.91) 



and 



ND< 



V 



v 3 



k — a 7 



2kZv 



a 



(8.92) 



f'bxtk = KN 



-^ D 2 



1 - 2aZ f 1 



,D 2 



a 



2 / D 2 D^ 



(8.93) 



The only difference between equations (8.92) and (8.75) is the factor (2kZi/(x — 7) 
multiplying a 2 7, which arises from the different manner in which intermediate terms were 
grouped in deriving these equations. Recalling that k, 7, and Zi are only convenient symbols 
representing unknown values, differences in terms involving products of these constants 
should not necessarily be considered significant. We note that if the distances in the image 
are constant and equal, i.e., Zi t = Z Tt = Z for i = 1, . . . , N, we can set a = 1/Z giving 



2kZ 



a 



7 = 1 



(8.94) 
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and hence 



/4 3 = S* = (8.95) 



We nonetheless expect //- ~ for many distributions of depths in the scene when bo -L V3. 
Given the fact that the two smallest eigenvalues of C' are both very close and both much 
less than fj,'-, » , we can see that with the inevitable errors in the data, it can easily happen 
that^ 3<M ' bo . 

From this derivation, we now see the basis for the test presented in the last chapter 
to determine if the algorithm has fallen into an alternate minimum. If it is true that 
bo || V3, then, as we saw previously in equation (8.25), the ratio of the middle to the largest 
eigenvalues is 

— « 1 (8.96) 

If on the other hand bo -L V3 and b' = £3, then 

— « (8.97) 

M3 

The ratio test is simple to compute and, based on numerous experiments, has proven 
to be an extremely reliable indicator of a false solution. It has consistently outperformed 
other measures which can be readily obtained from the data, such as S or Kc, the condition 
number of C. 



8.2 Error Analysis 

Having analyzed the numerical stability of the procedures for estimating translation and 
rotation as a function of both the size of the viewing field and the type of motion, we can 
now quantify more precisely the robustness of the estimates as a function of the error in 
the data. 

Errors in the data may arise from both random and systematic sources. Systematic 
error can usually be attributed to poor calibration of the imaging system while random 
errors result from the finite resolution of the image sensor and from approximations made 
in the matching procedure. Since the sensor is discretized, each pixel subtends a finite solid 
angle, which is approximated by a single vector from the center of projection to the center 
of the pixel. The block-matching procedure compounds the error due to the finite pixel size 
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X 



Figure 8-2: Change in r 8 - caused by error in determining the exact location of the corre- 
spondence with the left image. 



by assigning the correspondence to the centers of the best matching blocks. 

The combined result of both random and systematic errors is a displacement in the 
image plane of the estimated correpondence point location from its true position. We will 
assume that the position vectors £j of the feature points in the left image are known exactly 
and model the error in the corresponding r 8 - by a vector Sri which is added to the correct 
vector r 8 o, 

r % = r,- + 6n (8.98) 

as shown in Figure 8-2. Since the error, whether systematic or random, is assumed to be 
uniform over the image, we again set the weights W{ = 1 in all of the following derivations. 
We will consider the cases of random and systematic errors separately. For random 
errors we assume that the vectors Sri are independent and identically distributed, since 
each block is matched independently of the others, and since neither the pixel size nor the 
block size used in the matching procedure is a function of position in the image plane. We 
write Sri as 



Sri 



' pi cos /3i \ 
Pi sin (3i 



(8.99) 



V ° / 

where /3 8 - is uniformly distributed over [0,27r) and pi has some probability distribution over 
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[0,Rmax] with E[pf\ = a 2 . We thus have 

E[6Ti] = (8.100) 

and 

E\SrJSr,;] = \ 

0, i^j 



E[SrjSr 3 ] = \ ^ . /. (8.101) 



while the covariance matrix, A, is given by 



A = E[6r,6rJ] = { 



' 2 

tI 1 "^)' t = J (8.102) 



0, i^j 



Systematic errors are modeled by a constant vector <*)r added to all of the vectors r 8 o. 
In this case we have 

E[6t] = pSr, and, E[Sr T Sr] = p 2 (8.103) 

Since the error is constant, of course, its variance is zero. 

We now examine the first-order effects of including these error terms on the estimates 
of the baseline and the rotation. 

8.2.1 First-order error in the baseline (rotation known) 

We consider first the case where the rotation is known exactly. From our definition of 
the error in equation (8.99), we can write the vectors c 8 -, defined in (7.5), as 

Q-i — -t i x r^' 

= £'i X r i0 + £'i X Sr t 

= c l0 + S Cl (8.104) 

We also have 

A t - = a ■ b = Set ■ b (8.105) 

since c 8 o • b = by the definition of c 8 o as the error-free vector (£' i X r 8 o). We thus write C 
as 
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c = E 



N 

T 



8 = 1 

N N 



,T 



Erp r ^ / rp rp ri 

c i0C i0 + 2^ (^Cio^C,- + <$C;C i0 + <5c 8 <5c 8 

8 = 1 8 = 1 

= Co + AC (8.106) 

Dropping the last term, which is second-order in the error, we then write 

N 

ac « J2 { c «> Sc T + Sc * c l) ( 8 - 107 ) 

8 = 1 

Let h$ denote the eigenvector corresponding to the smallest eigenvalue of C, and let t>i, 
i>2, and b3 represent the eigenvectors of Co with eigenvalues /ii, [j,?, an d ^3, where 

Ms > M2 > Mi = (8.108) 

and 

|bi| = |b 2 | = |b 3 | = 1 (8.109) 

Using a result from matrix perturbation theory [91], we can express h$ by a first-order 
Taylor series expansion in terms of AC and the unperturbed eigenvectors and eigenvalues 
of Co as 

bs K bl - i^r 1 ] b ' ~ l^r^j b3 (8 - ll0) 

= bx + ^bx (8.111) 

The error vector 6h\ is perpendicular to bi, and its magnitude, given by 

|tfbi| 



\ 



'bjACb x \ 2 /bjACb x \ 2 



(8.112) 



A*2 / \ A*3 



approximates the angle, #&, between bi and h$. 

The important quantity in determining the magnitude of the error is clearly the vector 
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ACbi. From (8.106) we have 

N 

ACb x = J2 { c ^cJ + 6cicJ ) bi 

8=1 

N 

= X) A « c «o ( 8 - 113 ) 

8=1 

Expressions for the estimation error in the cases of random and systematic measurement 
errors will now be derived separately. 

Uncorrelated random error 

When the error is uncorrelated, we have from equations (8.105) and (8.100) 

N 

£[ACb x ] = ££[A,-]c t -o 

8 = 1 

N 

= £(b! X £'i) T E [Sn] c,-o = (8.114) 

8 = 1 

giving also, 

E[Shi] = (8.115) 

The fact that the expected value of 8h\ is zero only means that it has no preferred 
direction. The appropriate measure of the error is the magnitude of 8h\ which is also, to 
first order, the angle Of, between bi and h$. We thus compute 

¥ b =E \ef\ = \h^E [ACbxbjACJ b 2 + \hjE [ACbxb^ACJ b 3 (8.116) 

From equation (8.113), we can write the term ACbib x AC as 

N N 

ACbibjAC = J2 E A * V*ocJ (8-117) 

8 = 1 j = l 

and hence, by independence of the errors, 

N N 



^[ACb x bTAc] = ^^^AvA^c-oc 



T 
JO 

8 = 1 j = l 
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N 

= J2 E [^] c ^l (8-118) 

8 = 1 

We could approximate this sum as an integral, following the approach taken in Sec- 
tion 8.1.1, and derive an exact expression for E ACbib x AC . However, we can gain more 
insight into the problem by making the approximation that the term -E[A 2 ] can be replaced 

by its average value A 2 , where 

-, N 

8 = 1 

Making the substitution in equation (8.118), we have 

N 

^ACh^Ac] = ^A 2 c 80 4 

8 = 1 

= A2C (8.120) 

and hence the error # 2 becomes 

¥ b = ^bTC b 2 + ^bTCob 3 

= v(— + —) (8.121) 

VM2 fJ-3/ 

We now need only to find an expression for A 2 . From equations (8.102) and (8.105), the 
expected value of A 2 is 



E [A?] = E [(b! X £' 8 ) T ^8^rJ(b 1 x l'j)\ 



° 2 - ,/ vr 



2 v b lX £' 8 ) [l-zz^faxl'i) 



2 

°— [|£' 8 | 2 - (£'i ■ b x ) 2 - ((bx X z) ■ I'if] (8.122) 
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Substituting this equation into (8.119) and approximating the sum by an integral we have 



A 2 



a 



D r 2ir 



JO 



n2 



\£'\ 2 - (£' ■ by - ((bi X z) ■ I'y idida 



(8.123) 



Then, using the results of Appendix B and simplifying, we obtain 

D 2 



a 



X 2 = — 



|(bi X v 3 ) X z\ z + — (l + (bi • v 3 f ~ |(bi xz)x h\ 



(8.124) 



Combining this expression with the formulas for the eigenvalues of Co given in equa- 
tions (8.20) and (8.21), we can write 9 2 in its most general form as 



kN 



~|2 



D 2 



;b x x v 3 ) X z\ z + — [1 + (bi • v 3 ) 2 ~ |(bi X z) x v 3 \ 
1 



Vi? 2 + D 2 {h x •w 3 ) 2 +4|b 1 x v 3 \ 



(8.125) 



We observe that # 2 is proportional to the error variance, a 2 , and to the squared distances 
of objects in the scene. (Recall that k = E[l/Z 2 ].) We can also see that 9 2 —^ as 
N —^ oo as should be expected for an unbiased estimator. The behavior of 9 2 as a function 
of D, however, depends on the orientation of bi with respect to v 3 . When bi _L v 3 , 
equation (8.125) reduces to 



2kND 2 



D 2 

|(bi X v 3 ) X z\ 2 + —\z X v 3 \ 



4 + D' 



(8.126) 



As D —? the term in 1/_D 2 dominates giving 



2a 2 



kND 2 



(bi X v 3 ) X z 



~|2 



(8.127) 



so that 9 2 —^ oo as the field of view decreases. As D increases, however, 



2kN 



~|2 



D 2 



(bi X v 3 ) X z\ + I 1 + — I \z X v 3 \ 



(8.128) 



Since v 3 is usually very close to z we can neglect the term \z X v 3 \ 2 for all reasonable 
values of D. We thus expect the error to become constant for large fields of view. 
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When t>i = £3, any vector perpendicular to v^ is an eigenvector of Co with eigenvalue 
H = kND 2 /4. The expected squared error thus becomes 



a 



2 



|2 



2-\zxv 3 \') (8.129) 



6 kN 
and is now completely independent of the field of view. 

Systematic error 

When the error vector Sr is constant, equation (8.113) becomes 

N 



(8.130) 



ACbi = ^2 ^'CjO 

8 = 1 

Using the rigid body motion equation (8.9), we can write 

b x £'i = -Z r% (£'i x r,- ) (8.131) 

and hence 



/ N 

ACb 1 = - [J2Z r 



'r,c l0 cl\ Sr (8.132) 

We now approximate that the depths Z Tt can be replaced by their average value Z r so that 



ACbi = -Z r l^2c t0 cj \ Sr 



= -Z r C Sr (8.133) 

Substituting this expression into equation (8.110), we find that the estimated value of 
the baseline is given by 

bs „ bl+ W^£) b2+ WM^£) b3 

= bi+^((b2^r)b2 + (b3^r)b 3 ) (8.134) 
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which can also be written as 

b s = bi + Zr(tfr-(bi -tfr)bi) (8.135) 

This result makes perfect sense, of course, since it states that the adjustment to the 
baseline is along the component of 6r which is perpendicular to t>i. The angular error, #&, 
is given by 

e b « \sb\ 

= Z r |bi X (bi X 6r)\ 

= ^T|bi X 6r\ (8.136) 

With systematic measurement errors, the estimation error is still a function of the 
distances to objects in the scene, but no longer depends on either the field of view or the 
number of correspondence points N. 

8.2.2 First-order error in the rotation (baseline known) 

We now look at the case where the translation direction is known and examine the error 
in the rotation estimate. To first order, this error can be associated with the incremental 
adjustment computed by the Update_q() procedure when the correct b and q are input. 
Let q s denote the updated rotation quaternion and q its true value. Using the notation of 
Section 7.1.2 

q s = 6qq (8.137) 

with 



6q = I cos — , fj sin — J (8.138) 

The error quaternion <*)q corresponds to an additional rotation of 69 about an axis r\ 
applied to the true rotation q. From equations (7.12), (7.15), and (7.17), 69 and r\ are 
computed from the vector m, given by 

m=-^6»^ = A- 1 h (8.139) 
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where 



We can write a, as 



N N 



h = J2 A * *i = E A « ( b x r «) x £ '« ( 8 - 14 °) 



8=1 8=1 



a 8 = a j0 + Sa t (8.141) 



with 



a. i0 = (b X r,- ) X £'i, and <*>a 8 = (b X <*>r 8 ) X £' % (8.142) 

The term A 8 ^a 8 - is second-order in the error, however, and so is dropped, leaving 

N N 

h « ^A 8 a j0 , and A = ^a i0 aj (8.143) 

8 = 1 8 = 1 

Just as the magnitude of the error in the translation estimate is given by the angle 
between h$ and b, the magnitude of the rotation error vector, |m| = \S0\, is related to the 
angle between q s and q. Taking their inner product, we find 

q§-<t = (tfqq)-q 
= <5q -(qq*) 

= Sq • e 

60 
= cos— (8.144) 

using identities (A. 10) and (A. 7) from Appendix A. 
The squared estimation error is thus given by 

\S0\ 2 = m T m = ^Aq^q^ (8.145) 

which can be simplified by expanding Ao and Aq in terms of their eigenvectors. Let wi, 
W2, and W3 denote the eigenvectors of Ao with eigenvalues /ii, [j,?, an d ^3, where 

fi 3 > fi 2 > m (8.146) 
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Because Aq and A are symmetric, they can be diagonalized as 



1 C T 



where 



so that 



A = SVS T , and A" 1 = SV _1 S 



S = (wi w 2 w 3 ), V = diag(/ii,/i 2 ,M3), and S S = I 



T A -i A -iv._ l ,„,. .1^2 , J_ (w2 . h) 2 + i_ (w3 . h) 2 

M2 



h i A" 1 A- 1 h = — (wi h 
Mi 



t4 



(8.147) 



(8.148) 



(8.149) 



The expression for the estimation error will depend on whether the errors in the data 
are systematic or random. These cases are now examined separately. 

Uncorrelated random error 

With random measurement errors, the expected value of m is zero since 



N 



E[m] = A- X £[h] = A" 1 J2 E[Xi] a,- = 



(8.150) 



8 = 1 



which implies, as we should expect, that m has no preferred direction. The variance of the 
error is thus E[66 ], which can be written as 

W = E[S6 2 ] = \ w^[hh T ] Wl + \ wj£[hh T ]w 2 + \ wj£[hh T ]w 3 (8.151) 

Mi M2 M3 

The important quantity to compute is clearly _E[hh ]. Using (8.143), we have 



N N 

/ „ / „ Aj-Aj a j0 a 

8 = 1 j = l 



E [hh T ] = E 

N 

8 = 1 

We again make the approximation that -E[A 2 ] can be replaced by its average value A 2 so 



(8.152) 
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that 



N 

£[hh T ] = A 2 ]Ta 8 oa 



8 = 1 

A^Aq 



(8.153) 



giving 



A 2 



A 2 



S9 2 = — Wj A wi H — j w 2 A w 2 H — j w 3 A w 3 



A 2 
4 



AM^ + ^ + i) 



(8.154) 



which is completely analogous to equation (8.121) for the error in the baseline estimate. 

We can find an expression for S9 2 in the special cases b _L v 3 and b = v 3 for which we 
previously derived the eigenvalues of Ao in Section 8.1.2. From equations (8.43)-(8.45) we 
have in the case b _L v 3 



Mi 



jND 4 



24 



(8.155) 



A*2 



~/ND 2 



(8.156) 



/ D 2 D A 

M3 = IN 1 + - + - 



(8.157) 



while A 2 , given in (8.124), simplifies to 



A 2 = — 



D 2 

(b X v 3 ) X z\ 2 + —\v 3 X z\ 2 



(8.158) 



The variance of the error is therefore 



se 2 



7 iV 



D 2 

|(b X v 3 ) X z\ 2 + —\v 3 X z\ 2 



(A. — 2 

\Tj4 + TJ 2 + 8 + 4D 2 + D 4 



2a 2 ( 6 



1 



7 iV \D 4 D 2 8 + 4D 2 + D 4 



(8.159) 
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after dropping the term \vs X z\ which is usually negligible. 

We see that S9 2 is proportional to the variance of the error in the data, a 2 , and inversely 
proportional to the number of correspondences, N. As D — ► 0, the error in the rotation 
estimate increases rapidly (S9 2 ~ 1/_D 4 ), while for large values of D, it eventually goes to 
zero. 

When b = »3 we have 

jND 2 , jND 4 

Mi = A*2 = — : — 5 an d M3 = — - — (8.160) 



|2 



and 

A 2 = — ^ — (2 - |^ x v 3 rj (8.161) 

The expression for S9 2 thus becomes 

^=^(2-|ix. 3 | 2 )(l + ^) (8.162) 

Again, S9 2 is proportional to a 2 /JV , however, its behavior as a function of D is much less 
severe than in the case of b _L £3. As D —? 0, <*># 2 increases only as 1/-D 2 , and as _D — ► 00, 
it approaches a constant value. 

Systematic error 

In the case of systematic measurement errors, we must derive an exact expression for 
h in order to obtain a formula for \S9\ 2 . From equations (8.105), (8.142), and (8.143), we 
have 

N 

8 = 1 

N 

= J2(( h x r «°) x £ ') £ ' ■ ( Sr x b ) (8.163) 

8 = 1 

Using (8.31), we can also write 

N y 
h = £# ((*> ^'W? ~ \ £ 'rf h£ 'J) ( Sr X b ) ( 8 - 164 ) 

8 = 1 T > 
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Approximating ZtjZ ri by its average value and assuming that the correspondences are 
distributed uniformly over the field of view, we replace the summation by an integral to 
obtain 



CD fl-K 



h -^\Y r )^L r ( (b ' t)l ' l ' T " l £ '| 2b£/T ) ^ r X h )td£da 



(8.165) 



The solution to this integral can be obtained by combining equations (B.12) and (B.28) of 
Appendix B. Skipping the algebra, the result is: 



HI" 



!b.*3)(^-I+(l-2fk«3 T 



'i + £W 



(Srxb) (8.166) 



We note also that we can express the average value of A 8 - as 



N 



^ Z) A « 



1 



N 



8 = 1 



■kD 2 J 



D r 2ir 



(fox byi'^d^da 



(Sr X b) • v 3 



(8.167) 



Using this expression in equation (8.166) then gives 



h = N\ 



Zj t 



b-y&rxb) + A 1-^ ^ 3 ) -A(i + ^J b 



(8.168) 



Again, the behavior of the error as a function of D depends on the orientation of b and 
V3. When b _L £3, h becomes 



Mm+$)» 



(8.169) 



Because b is an eigenvector of A with eigenvalue 

/ D 2 D^ 

M b = 7iV 1 + - + - 



(8.170) 
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we find from equation (8.149) that the error \80\ becomes 

,„„, 2| A| fz7\ 4 + D 2 , 

(8.171) 



1 ' 7 \Z r ) 8 + 4D 2 + D 4 

The error does not depend on N, but does depend on the field of view. As D — ► 0, the 
error becomes infinite as 1/ D 2 while as D —? oo, \60\ —? 0. 
When b II £3, h becomes 



ND 2 I Z, 



M («rxb) (8.172) 



4 \Z, f 

In this case, any vector perpendicular to b is an eigenvector of A with eigenvalue 
fj, L = -yND 2 /4. We thus have 

m=-(— ](«rxb) (8.173) 

7 \ Z rJ 

and 

\S0\ = \m\ = -\—)\Srxb\ (8.174) 

l\ Z r) 

The error is now independent of both D and N . 

8.2.3 Coupling between estimation errors 

In the last two sections we have analyzed the estimation errors in the cases when either 
the baseline or the rotation is known. We now examine the situation when neither is known. 
Let S* denote the minimum value of S = J2i=i ^ an( i ^ $0 denote the value when the 
correct b and q corresponding to the actual motion are used. We can expand S in a 
first-order Taylor series to approximate S* as 



S ~S +^ 



T 



fL dS 

s=s om 



T 

m (8.175) 

s=s 



Assuming S* corresponds to a true local minimum and is therefore unique, this expres- 
sion defines a relation between <*)b and m which must be (approximately) satisfied to achieve 
optimality. The values of <*)b and m used to obtain S* from So cannot therefore be the same 
as those which minimize the error in the case of known rotation or known translation since, 
by definition, each of these assumes the other to be zero. 

In order to determine a constraint between <*)b and m, we need to use the necessary 
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conditions for S* to be a (local) minimum, which are 

Cb = / ub (8.176) 

where fj, is the smallest eigenvalue of C, and 

N 

h = ]TA 8 a 8 = (8.177) 

8 = 1 

Since there is always at least one solution to the eigenvector equation (8.176) for every 
rotation, it does not provide any new information. We thus need to find solutions to (8.176) 
that are also consistent with (8.177). 

Defining bo as the true translation vector, we have at S = S* 

X t = b • (£'i X r;) 

= (b + 6b) ■ ((£' i0 + 6£'i) X (r,- + Sn)) 

« 6b • {£' t0 x r,- ) + b • (S£'i X r,- ) + b • (£' i0 X 6r t ) (8.178) 

dropping terms that are second-order and higher in the error. Since A 8 - contains all of the 
first-order error, we also consider a 8 - ~ a 8 o. 

From equations (7.10) and (7.12) we obtain 

6£'i = £' i0 x m (8.179) 

where mo represents the incremental rotation vector applied to the true rotation to get to 
S = S*, and can thus write 

b • {81' i X r i0 ) = b • {{£' t0 X m ) X r i0 ) = -a^m (8.180) 

Using also the fact that 

b X t'i = -Z ri {£', X r,- ) = -Z ri c i0 (8.181) 
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equation (8.178) becomes 

A; = Sb ■ c t0 - aj m - Z Tt c t0 ■ Sr t 



= cJ (6b-Z ri 6ri)-&J mo (8.182) 

Combining these expressions, we find that the necessary condition for optimality is thus 

N 
^a io c£ (<Sb - Z Ti Sr t - a^m J = (8.183) 

8 = 1 

which can also be written as 

N N 

^a io c£ (Sb - Z rt Sr t ) = ^a 80 a^m = A m = h (8.184) 

8 = 1 8 = 1 

where ho is the value of h at S = So. 

If the measurement errors are random and uncorrelated, we have from equations (8.100), 
(8.115) and (8.150), 

E[Sri] = E[Sb] = E[ho] = (8.185) 

and hence equation (8.184) does not provide any new information. With random measure- 
ment errors, the expected values of Sb and m are zero whether the rotation and translation 
are estimated separately or together. Consequently, equations (8.121) and (8.154) for the 
variances of the estimates, d\ and S9 2 , are also still valid. 

With systematic measurement errors, however, the situation is different. We can write 



N N 

J2 aiocj = J2 _ -|r(( b o x t'io) X £',-o)(bo X I'io 



i=l i=l r i 



\({b x I'io) x £' 80 )(b x £' 80 ) T 

r, 

= E jt (( b o • t'ioWl - |^,-o| 2 bi^o) B xo (8.186) 

8 = 1 1-1 

where B xo is the cross-product matrix corresponding to the operation boX, and combine 
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equations (8.184) and (8.186) to give 

N 7 / 1 \ 

J2 Y~ ((bo • IWioi'l - |^o| 2 bo^ ) f^( b o X Sb) + (St X b )J = h (8.187) 

Comparing this equation with the definition of ho given in equation (8.164), we see that a 
solution exists only if Sb = 0, which implies that the estimation error is entirely absorbed 
by the rotation. 

This is a significant practical result as it implies that one can obtain very accurate 
estimates of the translation, even with poorly calibrated systems, when the full algorithm 
is used. We have to be careful, however, before concluding that Sb is always zero when the 
rotation and translation are estimated jointly, because h is only a linear approximation to 
the derivative of S with respect to the rotation. We can write So as 

N 

So = 



J2((stx 


b ) 


•^•o) 2 




N 

B fox 

8 = 1 

N f D 


bo) T 


Z'io£'T (ST x 


b 


I-2TT 


, iT „i 





— - / / (fox bo) T «'£(*r X b ) idida 
kU z Jo Jo 



-2 ND 2 ( o -9\ 

N X 2 + ——(\STXbo\ 2 - X) (8.188) 



using the solution for the integral given in equation (B.19) and the definition A = (#rxbo)-#3 
from equation (8.167). 

When bo || £3, A = 0, so that 

ND 2 9 

5o = |foxb | 2 (8.189) 

From equations (8.172) and (8.173) we have 

lT 1 / ~%\ 2 ND 2 , r ,„ ND 2 lr , ,9 , 

hjmo = - -^ — — \St x b| 2 « — — \St x b| 2 (8.190) 

7 Ur / 4 4 
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after setting 

1 f^7\ 2 

8.191 




and thus when bo and v 3 are aligned, the quantity — h mo exactly cancels So- 

9 

When bo -L v 3 , however, the term \8r X bo| 2 — A in the equation for So becomes 

|<5rxb | 2 -A = \Sr X b \ 2 - ((Sr X b ) ■ v 3 ) 2 

= \(6r X b ) X v 3 \ 2 

= (Sr-v 3 ) 2 

~ (8.192) 

since Sr is parallel to the image plane and v 3 ~ z. We thus have 

5o ~ NX 2 (8.193) 

while equations (8.169) and (8.171) combine to give 



,T 



1 f ZA - 2 (4 + D 



2\2 



ho 1 mo = tH^M ^ A 



2 1 \Z r ) S + AD^ + D* 



~ iVA 2 (l — -) (8.194) 

The quantity — h mo now only cancels So in the limit as D —> 0. We thus conclude 
that the constraint equation (8.187) is only a loose approximation in the case of bo -L v 3 
since the nonlinear terms which were neglected must become significant for large values of 
D in order for the conditions of optimality to be obtained. 

8.3 Experimental Verification 

In order to test the results derived in this chapter, it was necessary to generate a random 
data set of 3-D coordinates corresponding to world points in the scene. This was done by 
first generating the Z values according to a specified probability distribution and then 
selecting the X and Y values so that the position vectors would be uniformly distributed 
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Figure 8-3: Probability density function used to obtain random Z values. 



over the specified field of view. 

The probability density function used to assign Z values for the tests is shown in Fig- 
ure 8-3 and was determined based on a rough estimate of the distribution of depths that 
are typically encountered in practice. The mean value of Z was set at 19 baseline units, 
which means that if the camera moves 20 cm between frames, the average distance to any 
object in the scene would be 3.8 m. Once the value of Z is selected, £ and a are chosen 
uniformly over the intervals [0,-D] and [0,27r), respectively, so that X and Y are computed 
as 

X = Z£cosa, and, Y = Z^sma (8.195) 

Given the set of 3-D points, p;, the set of correspondences, p r , is then generated by 
applying the rigid body motion equation 



p r = Rp/ + b 



(8.196) 



to each point in p;, for some value of R and b. We thus obtain an error-free list of N pairs 
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{(r 8 ,£ 8 )}, where 

r; = —Vri, and, l % = —pu (8.197) 

Measurement errors are then simulated by determining a vector Sri, 



Sri 



I pi COS f3i * 

Pi sin fa 

V ° J 



(8.198) 



as previously defined in equation (8.99). 

The magnitude of the error, pi, is specified in units of the focal length, /, so that it is 
independent of both the field of view and the spatial discretization of the sensor. To convert 
from units of / to pixels, one can multiply pi by 2D/n, where n is the number of pixels 
along one dimension of the sensor. For example, with a 20° field of view (D = .364) on a 
256x256 pixel array, p = .0028/, corresponds to 1 pixel. 

For systematic errors, the direction Sr and the magnitude p are given as input to the 
simulation program. For these tests, Sr was set equal to y for simplicity and a value was 
selected for p between and 0.025/. For uncorrelated random errors, each vector Sri was 
determined independently by choosing Pi uniformly over the interval [ 0, 2ir) and pi over the 
interval [0, R max ], where R max was supplied as an input parameter. Given that the vectors 
Sri are uniformly distributed over a disk of area TrR 2 nax , we thus have 

E[SrjSr t ] = E[p?] = ^ = a" (8.199) 

Values of R max were chosen in the simulations to give a between and 0.02/. 

In order to compare the equations determined in Sections 8.2.1 and 8.2.2 for 9^ and S9 2 , 
with the estimation errors actually computed for data corrupted by random error, several 
tests were performed for different types of motion and different values of D and N. The 
results of two series of tests, one with b _L v^ and the other with b || v^ are shown in 
Figures 8-4 and 8-5. In both series the rotation was given by cD = (0,0,1) with 9 = 5°, so 
that £3 = z, while the translation for the first series was specified as b = (1, 0, 0) and in the 
second as b = (0,0, 1). The same set of 3-D coordinates p;, with N = 50, Z = 18.95, and 
k = E[l/Z 2 ] = .0062, was used for all tests. For each motion, actual and predicted errors 
for the translation and the rotation were computed for viewing fields of (f> = 20°, (f> = 40°, 
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Figure 8-4: Estimation errors in translation and rotation with b _L v^ (uncorrelated mea- 
surement errors), b = (1, 0, 0), cD = (0, 0, 1), 9 = 5° 
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Figure 8-5: Estimation errors in translation and rotation with b || v% (uncorrelated mea- 
surement errors), b = (0, 0, 1), u = (0, 0, 1), 9 = 5° 
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Figure 8-6: Actual and predicted values of ^2/^3 for b _L v 3 , for cf) = 20°, 40°, and 60 c 
with N = 50 and N = 100. 
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Figure 8-7: Actual and predicted values of ^2/^3 for b || v 3 , for <f> = 20°, 40°, and 60 c 
with N = 50 and N = 100. 
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and <f> = 60° (D = 0.364, D = 0.840, and D = 1.732). 

Since the equations for d\ and S9 2 only predict expected values, a total of 16 different 
data sets were used for each test and each value of a in order to obtain a statistically 
significant approximation of the average error. The data for each test were generated 
from the same set of ideal correspondences computed for the given motion by initializing 
the random number generator used to determine the Sri with different seeds. The errors, 
shown in Figures 8-4 and 8-5, are plotted as circles when they were computed with either 
the rotation or the translation given, and as asterisks when the full motion was estimated. 
In many cases, particularly with (f> = 20° and b _L £3, the estimation of the translation was 
unstable in the sense that the eigenvector corresponding to the smallest eigenvalue of C 
was not the one which was closest to the true translation. In these cases, the error reported 
is the smallest angle between the true value of b and either of the other two eigenvectors, 
and is marked on the graph with the symbol 'U'. 

The average of the computed errors agree well with the predicted values in most cases. 
There is a considerable spread in the results for both the baseline and rotation errors for 
motions with b _L V3, however, this should be expected from the higher sensitivity to error in 
this case. There is much less spread in the results when b || £3, and also many fewer instances 
of instability, although there does appear to be an increase in the spread of translation 
estimates at larger fields of view. On closer examination, however, it can be seen that the 
variation in this case is mostly in the errors from the full motion estimation, while errors 
in computing the translation with known rotation cluster well about the predicted average. 
The reason for the increased variation in the estimates from the complete algorithm, in the 
case of large viewing fields with b || £3, is not apparent from the first-order analysis of this 
chapter, although we can conjecture that it is caused by the nonlinear terms which were 
neglected. 

The predictions of Sections 8.1.1 and 8.1.3 for the ratio of the middle and largest eigen- 
values of C were also tested for these simulated motions. In Figures 8-6 and 8-7 the ratios 
predicted from the estimated b and £3, as well as the actual values computed from the 
matrix C itself, are shown for each value of a and (f>. As before, each test was performed 
16 times with different sets of randomized errors added to the correspondence data. The 
dashed lines indicate the best-fitting curves to the results from the 16 tests plotted as a 
function of a. It should be noted the results from all of the tests are shown, including those 
for which the translation estimate was unstable. The actual ratios are always computed 
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from the middle and largest eigenvalues of C. However, the predicted ratio is computed 
using the eigenvector closest to the true baseline as the estimate of the translation. 

The first observation which can be made based on these tests is that there is a difference 
between the predicted and actual values for 1^2/1^3 when b _L V3 which increases with the 
field of view. This effect can be attributed to the fact that the value of D = tan (f> was 
used to compute the predicted ratio, while it probably would have been better to use 
D/y/2, which is the average distance of points from the center of the image, and which is 
a more appropriate statistical measure of the radius of the viewing field. With b exactly 
perpendicular to V3 the predicted ratio from equation (8.23) is [I2/1-I3 = D 2 /4. However 
the actual ratios agree much better with D 2 /8 which for (f> = 20°, 40°, and 60°, would give 
predicted ratios of .016, .09, and .375, respectively, 

Since equations (8.20) and (8.21) for the eigenvalues of C were derived by modeling 
the discrete correspondences as a uniform density spread over the field of view, tests were 
performed with both N = 50 and N = 100 to assess how strongly the eigenvalue ratios 
are affected by this approximation, which clearly depends on N. With b _L V3 there is 
not a significant difference in the results for the values of N tested, however, there is a 
large difference for b || £3. This difference can be explained by noting that the ratio 
1^2/1^3 is a measure of the symmetry in the distribution of the vectors c 8 - = £'i X r 8 - in the 
plane perpendicular to b. The equality of the two largest eigenvalues in the case b || £3 
predicted in equation (8.25) is thus a direct consequence of the uniform density assumption. 
Nonetheless, the values of 1^2/1^3 are seen to be consistently higher when b || V3 than when 
b _L V3, except in the case of (f> = 60° with N = 50, where the values are similar. It can also 
be seen that the difference in the ratios is largely unaffected either by errors in the data or 
by the fac that the algorithm sometimes converged to the wrong solution. 

It can thus be concluded that the ratio test is an effective indicator for discriminate 
between correct and false solutions. However, in order to implement the test on a real 
system, it is necessary to first develop a baseline profile of the expected ratios for translations 
parallel and perpendicular to the image plane, rather than predicting their values from 
equations (8.20) and (8.21), since these will depend on the actual geometry of the sensor 
and focal length of the lens, as well as on the average number of correspondence points 
returned by the matching procedure. 

Finally, the case of systematic measurement errors was investigated for b _L V3 and 
b || V3 with the results shown in Figures 8-8 and 8-9 for (f> = 20°, 40°, and 60°. Since 
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Figure 8-8: Errors in estimates of translation and rotation with b J_ £3 (Systematic 
measurement errors), b = (1, 0, 0), Q = (0, 0, 1), 9 = 5°. 
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the errors are deterministic, it was not necessary to make numerous tests, as in the case of 
random data, and so the results of only one simulation are shown for each motion and field 
of view. 

It can be seen that the actual errors agree well with those predicted by equations (8.136), 
(8.171) and (8.174). For translation with known rotation, the errors are essentially inde- 
pendent of the field of view for both b _L v^ and b || £3, although there is a slight difference 
in both cases for (f> = 20°. For the estimates of rotation with known translation, the errors 
do increase as (f> decreases when b _L £3, but are independent of (f> when b || £3. 

When the full motion is computed, the errors in the rotation scarcely change from those 
computed with the translation known. Errors in the translation estimates, however, are 
reduced drastically. With b _L £3, the error is removed almost entirely at (f> = 20° and is 
only slightly worse at larger fields of view. With b || £3, on the other hand, the error is 
essentially removed at all values of (f>. 

As previously noted, the fact that one can obtain very accurate estimates of the transla- 
tion from the complete algorithm when the measurement errors are correlated, is significant 
as it implies that accurate internal camera calibration is not required to obtain useful infor- 
mation for many applications, such as passive navigation, that do not require highly precise 
estimates of the rotation. 



Chapter 9 



Summary of Part I 



Many issues have been covered in the preceding chapters, and it is useful to summarize 
the major results and conclusions. 

In the architecture outlined in Chapter 4, it was determinined that processing in the 
motion system should be divided into three stages: 

• Edge detection, 

• Feature matching by block correlation of the binary edge maps, and 

• Solving the motion equations. 

Edge detection would be performed directly on the analog signals acquired by the pho- 
tosensors using a fully parallel analog array processor implementing the multi-scale veto 
algorithm. In Chapter 5, this edge detection algorithm was shown to have the following 
advantages over classical methods: 

• There is no tradeoff between edge localization and smoothing. Noise and unwanted 
minor features can be effectively removed by adjusting the sequence of thresholds 
applied at each smoothing cycle. The edges which are detected, however, remain lo- 
calized at the positions of the features in the original image regardless of the thresholds 
used. 

• The method does not require computing second differences or searching for zero- 
crossings. Hence the circuitry is much simpler, and all processing is local. 
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• The algorithm is designed to take advantage of the signal processing capabilities of 
CCDs. It can thus be efficiently implemented on a CCD array with circuitry placed 
at each pixel to compute differences and store edge signals. 

At the end of Chapter 3, it was concluded that the most appropriate method for ob- 
taining the correspondence points needed by the motion algorithm was a block-correlation 
procedure using the binary edge maps produced in the first processing stage. Due to the 
presence of repeating patterns which occur naturally in real scenes, however, similarity mea- 
sures alone cannot determine the best matches and achieve an acceptably low false-alarm 
rate. In Chapter 6, a series of tests was developed to add to the correlation procedure 
in order to minimize the error rate. These tests included rejecting matches from blocks 
which have too few or two many edge pixels to give a low probability of a false match, and 
rejecting altogether blocks for which there were multiple possible matches. 

Chapter 7 covered the development of the algorithm to estimate motion from the set of 
point matches found in stage 2. Due to the complexity and nonlinearity of the equations 
involved, it was determined that the motion algorithm should be executed on a standard 
digital processor. Nonetheless, given the goal of building a low-power system, it was neces- 
sary to simplify the algorithm as much as possible so that minimal processing power would 
be required. It was shown that by alternating the procedures for updating the baseline 
and the rotation, the complexity of the operations could be considerably reduced such that 
the most complex computation would be to solve a 3x3 eigenvalue-eigenvector equation 
at each iteration. Simulations of the algorithm on several image sequences using the point 
correspondences determined by the edge detection algorithm and the matching procedure 
showed that these simple methods developed for efficient implementation in VLSI were 
as effective for obtaining accurate estimates of the motion as more complex procedures 
commonly implemented in software. 

Finally, we could not build a robust system for computing motion without studying the 
effects of measurement errors on the estimates, and how these effects are related to the type 
of motion and scene structure, as well as to the spatial resolution of the image sensor, the 
field of view, and the number of correspondence points found. In Chapter 8, the numerical 
stability of the motion algorithm was thoroughly analyzed and expressions were derived for 
the expected estimation error in the cases of both random and systematic measurement 
errors. Three important results were obtained from this analysis: 
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• A test was developed for reliably determining when the motion algorithm converges 
to an incorrect solution based on the ratio of the two largest eigenvalues of the matrix 
C, defined in equation (7.6). 

• Design guidelines were developed for building a system with the appropriate sensor 
resolution, field of view, and number of matching circuits to achieve a given maximum 
expected estimation error. 

• It was also discovered that precise internal camera calibration was not required to 
obtain accurate estimates of the translation, provided that the rotation is estimated 
as well. This significant result implies that applications such as navigation which 
require accurate knowledge of the translation direction, but which are less sensitive 
to errors in the rotation, can implement the motion system without also needing 
sophisticated calibration procedures. 
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Chapter 10 



Basic Requirements 



The plan for the design of the multi-scale veto (MSV) edge detector was outlined in 
Chapter 5. A two-dimensional CCD array as shown in Figure 5-3 is ideally suited for 
performing the successive smoothing operations required by the algorithm. The remaining 
tasks of computing the differences between the smoothed brightness values at neighboring 
pixels and testing if the magnitude of these differences is above a given threshold, is then 
performed by additional circuitry placed between each pair of nodes within the array. 

Several considerations were important in determining the design of the different elements 
in the MSV edge detection processor, of which one of the major concerns was the silicon 
area required for each pixel. Since the number of pixels in the image array directly impacts 
the robustness of the motion estimates which can be derived from the edge maps, it is 
important that each cell in the array be as small as possible. One of the ways to reduce the 
per-pixel area, which is already incorporated in the design, is by using time as a dimension. 
Since the threshold tests are performed sequentially at the end of each smoothing cycle, 
only one difference and test circuit is required for each node pair. However, this also means 
that space is needed to store the intermediate results and that the internal circuits must be 
fast enough to complete all of the tests within the time allotted for processing the image. 

Small area implies simple circuits. However, if the edge detector is to produce useful 
results, the need for simplicity cannot compromise the resolution requirements of the algo- 
rithm. The attenuation factors for several idealized image features were given in Table 5.1 
as a function of the number of smoothing cycles performed. If equation (5.8) is used to 
compute the thresholds r^ for each smoothing cycle k = 0,...,ra, the internal difference 
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and test circuits must be able to resolve differences as large as To, the initial threshold, and 
as small as T n = G n jTo, where G n j is the attenuation of the chosen model feature after n 
smoothing cycles. 

To translate this requirement into a percentage of the full scale range (FSR), we can 
take a specific example with n = 5 using the horizontal step edge as the model feature. In 
grayscale images with normal contrast, To is usually set at around 10% of FSR, and from 
Table 5.1, the attenuation factor for the horizontal step edge after 5 cycles is seen to be 
G5 = 0.246. The range of distinguishable differences must thus be between 10% and 2.5% 
of FSR, or in terms of bits of precision, between 4 and 5 bits. If either the diagonal step 
edge or the horizontal 2-pixel line is used as the model, the resolution requirement jumps 
to between 4 and 6 bits. 

Designing a small absolute- value-of-difference circuit with this much resolution has been 
one of the major challenges in building a working MSV chip. In the smoothing and segmen- 
tation chip designed by Keast [86], a CCD-based absolute value of difference circuit was 
used, primarily because of its small size with respect to a transistor-based design. A sim- 
ilar structure was also included in the stereo disparity chip designed by Hakkarainen [65]. 
Unfortunately, for reasons which Keast discovered and which will be discussed in the next 
chapter, the CCD circuit has a 'dead zone' for small differences which limits its resolution 
to less than 25% of FSR. It was thus necessary to design a new transistor-based absolute- 
value-of-difference circuit occupying the least area possible. 

Given the above constraints, a 32x32 prototype array implementing the MSV algorithm 
was designed and fabricated through MOSIS using the Orbit 2/im CCD/CMOS process. 
The next three chapters are devoted to discussing the design and testing of this array and to 
analyzing the changes required to build a full-size (~256x256) processor, possibly operating 
as an image sensor as well as an edge detector. In order to clarify aspects of the design 
involving CCDs, the following chapter describes the basic physics of charge storage and 
transfer and the input and output structures used to interface with CCD arrays. Chapter 12 
covers in detail the design of each of the major components in the MSV array, and finally, 
Chapter 13 describes the test system and results. 



Chapter 11 



Charge Coupled Device Fundamentals 



Charge coupled devices are based on the principle that a charge packet may be confined 
within a potential well created by applying a voltage to a polysilicon gate and may be moved 
from one location to another by appropriately manipulating the gate voltages. Conceptually, 
it is useful to think of CCDs as buckets and of the signal charge as water. The process of 
charge transfer is similar to that of moving the water by placing the buckets on risers, as 
shown in Figure 11-1, with tubes connecting the bases of neighboring buckets. The water 
levels in adjacent buckets are determined by the relative heights of the risers, just as charge 
levels under adjacent gates of a CCD are determined by the relative difference in the well 
potentials. 

The physics of CCD operation is of course considerably different from that of the bucket 
brigade. CCDs exist in two forms: surface channel and buried channel devices. Both 
operate on the same principle of charge transfer; however, their physical structure and 
device characteristcs are very different. It is useful to examine both structures in order to 
understand the advantages and limitations of each. 

11.1 Surface Channel Devices 

The simplest form of CCD, which is the surface- channel device, is constructed from a 
series of adjacent MOS capacitors operating in deep-depletion mode. Figure 11-2 shows the 
typical structure of a MOS capacitor formed by sandwiching a thin layer of oxide, Si02, 
between a polysilicon gate and a p-type semiconductor substrate. When a voltage greater 
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Figure 11-1: 'Bucket brigade' analogy of charge coupled devices. 



than the hatband voltage Vfb is applied to the gate, the substrate is depleted of majority 
carriers and a depletion layer is formed as shown in Figure ll-3a. If the gate voltage is 
raised above the threshold voltage Vj of the material, defined as the point at which strong 
inversion occurs [92], minority electrons are attracted to the Si-SiOz interface and the 
depletion region ceases to increase in depth (Figure ll-3b). 

In order for the inversion channel to form, there must be a supply of available electrons. 
In an NMOS transistor, electrons in the conducting channel are supplied by the metal 
contact made to the source diffusion. In the MOS capacitor, the electrons must come 
from the substrate where they are produced by thermal generation of electron-hole pairs. 
Fhermal generation in the bulk results in a flow of electrons from the substrate to the high 
potential region at the surface, known as dark current, which continues until equilibrium 
conditions are obtained. In a well-designed process, however, the dark current density, Jp, 
is typically < InA/cm 2 . At this level, the time required for the device to reach equilibrium 
is on the order of minutes [86]. 

CCDs exploit this long equilibration time to perform useful signal processing tasks. 
When V g is raised above Vj, the depletion region initially extends beyond its maximum 
equilibrium depth, as shown in Figure ll-3c. This is the condition known as deep depletion. 
Signal charge may be introduced into the device either optically or electrically and will be 
confined to the potential well until its maximum capacity is reached. The maximum charge 
which can be held is equal to the channel charge of the capacitor at equilibrium and is 
a linear function of the applied gate voltage. By placing gates connected to independent 
voltages adjacent to one another, the signal charge may be transferred between gates just 
as the water is moved in the bucket brigade. 
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Figure 11-2: The MOS capacitor. 

The primary advantage of surface channel devices is the linear relation between V g 
and Q ma x, the maximum signal charge. These devices suffer, however, from poor transfer 
efficiency due to the high number of interface states at the Si-SiO? boundary which can 
trap charge and then release it at a random time afterward. Trapping results in both noise 
and signal degradation when the charge must be transferred through a long series of gates. 
Consequently, surface channel devices are never used in the design of large high-quality 
sensors. Their chief application is in performing operations where the linearity of signal 
charge with gate voltage is important and where only a few gates are needed. 

11.2 Buried Channel Devices (BCCDs) 

Like the surface channel device, the buried channel CCD is also a non-equilibrium struc- 
ture. In the BCCD, however, the signal charge is held away from the Si-SiOz interface be- 
cause the potential maximum occurs inside the channel, several hundred nanometers below 
the surface. The buried channel is created by adding an ra-doped implant below the transfer 
gates. Electrical contact is made to the channel at n-\- diffusions placed at the extremities 
of the gate array, and when a sufficiently large positive voltage is applied with respect to 
the substrate, the buried layer is completely depleted of majority carriers. 

The potential profile, (f>(x), with depth in the semiconductor typically resembles the 
curves shown in Figure 11-4 [93], with the dotted and solid lines representing the profiles 
with and without signal charge, respectively. Here x represents depth below the oxide layer, 
with x = at the Si-Si02 interface. As seen in the diagram, the addition of the buried layer 
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a.) Depletion, V t > V g > Vfb 





b.) Strong inversion (equilibrium), V g > Vt 




N \\\ s\\ x\\ \\\ s\\ \\\ \\X x\\ 




c.) Deep depletion (non-equilibrium), V g > Vt 
Figure 11-3: States of the MOS capacitor. 
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Figure 11-4: Potential profile with depth of a buried channel CCD 

creates a non-monotonic profile such that the maximum potential (j) max resides at a distance 
x max below the interface. If electrons are injected into the layer, they will accumulate near 
x max rather than at the surface. 

The BCCD structure may be modeled as a series connection of lumped capacitances, as 
illustrated in Figure 11-5. C ox represents the oxide capacitance, e ox /t ox , while C,i\ and C,ii 
represent the depletion capacitances between the channel and the oxide and the channel and 
the substrate, respectively. The signal charge occupies a finite width, W c h, which cannot 
be neglected in computing the values of C,i\ and C,ii- Conventionally, this width is divided 
equally between the two depletion capacitances [92], resulting in the following expressions: 



and 
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The effective capacitance C e // between the signal charge and the gate is the series combi- 
nation of C ox and C,i\ ■ 
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Figure 11-5: Lumped capacitance model of a BCCD 
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We can immediately see two effects of the buried channel. The first is that the signal 
carrying capacity is less than that of a surface channel device due to the decrease in the 
effective capacitance caused by moving the charge away from the surface. Second, the 
effective capacitance depends nonlinearly on both the amount of charge present and, as will 
be seen shortly, on the gate voltage. 

Nonetheless, buried channel devices offer significant advantages in charge transfer effi- 
ciency. By keeping the signal charge away from the Si-SiO? surface, the interaction of the 
charge packet with traps at the interface is essentially eliminated. Although bulk traps also 
occur, they are much less frequent than those at the interface, and from a process stand- 
point, it is much easier to reduce the bulk state density than that of interface states [92]. 
Furthermore, the reduced effective capacitance between the signal charge and the gate in- 
creases the fringing fields, which are the dominant driving force in the final stages of charge 
transfer, as will be seen in Section 11.4. BCCDs are thus the structure of choice for large 
array sensors. 

In order to use a buried channel CCD for signal processing, we must first understand the 
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relationships between gate voltage V g , signal charge Q s i g (coulombs/ 'cm 2 ) , and the channel 
potential (f>(x). From this we can compute the magnitude of the lateral potential barrier 
between two adjacent gates held at different voltages, and hence the maximum signal charge 
per unit area which can be confined. This information is necessary in order to determine 
the design parameters for the transfer gates, as well as for the charge input and output 
structures. 

Under the depletion approximation, the charge density profile of a buried channel de- 
vice at gate voltage V g containing N e electrons / 'pm 2 is as shown in Figure 11-6 [93]. We 
use Nd to denote the doping concentration (donors/fim 3 ) of the buried channel and Na 
( acceptors /pm 3 ) to denote that of the p-type substrate. The signal charge distributes itself 
over a finite width W c h = N e /N£> due to the attraction of the negatively- charged electrons 
to the positively-charged fixed donor ions in the lattice. The depth of the charge packet, 
which ends at x = x max , is limited by the depletion region created by the junction between 
the ra-type implant and the p-type substrate. The widths of the space charge regions on 
either side of the junction are related by the charge balance equation 

N A x p = N D (x n - x max ) (11-5) 

where x p represents the extent of the depletion region into the substrate. 
The potential profile (f>(x) is obtained by integrating Poisson's equation 
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in each region of constant charge density shown in Figure 11-6. Within these four regions, 

Poisson's equation is given by 
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Figure 11-6: Charge density profile of a buried channel CCD with signal charge Q s i g 
(using depletion approximation) 



The constants of integration are determined by the conditions of continuity of (f>(x) at 
the region boundaries and the continuity of the electric displacement field at x = and 
,. These conditions are stated as: 
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Referring to Figure 11-4, we see that 



<t>s = V g - V fb + V o: 



;il.l6) 
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and since V ox = ^-^(nxed charge density + mobile charge density) [93], it follows that 



* ox — •-, \/* D^max J -* e) 
^ ox 



—N D (x max - W ch ) (11.17) 



The electric field across the oxide is given by 



E 0X = -— = -—N D (x max -W ch ) (11.18) 



^ox ^ox 



Direct integration of equations (11.7)— (11.10) applying the constraints (11.13) and (11.14) 
results in 

<f>( x ) = ~^ (x - (x max -W ch )) 2 + (f> max , Q<x<x max -W ch (11.19) 

(p\X J — tPmaxi %max '* ch _ ^ _ %max ^-L-L.^UJ 

r\%) — o V^ %max) T tPmaxi %max _ ^ _ %n ^ll.ZlJ 



cf)(x) = (x — x n — Xp) 2 , x n < x < x n + Xp (11.22) 



From equations (11.11), (11.16), (11.17), and (11.19) we can obtain a first equation for 

tPmax' 

4>m,ax = V g - Vf b -\ —N D (x max - W ch ) + ~z — -{x max - W c hf (11.23) 

^ox ^^Si 

while a second equation may be found by equating the potential across the n-p junction at 
x = x n and combining equations (11.5), (11.21), and (11.22): 

6 -^R( X - x ?(i + ^) a i24) 

^Si \ J-VA J 

Equating the righthand sides of equations (11.23) and (11.24) we obtain a quadratic 
equation which can be solved for x max . Given x max we can find (j) max , and therefore (f>(x), 
everywhere within the silicon. 
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The maximum number of electrons per unit gate area which can be held in the channel 
is given by: 

^ e,max — ^ D%max ^ll.ZOJ 

Of more interest, however, is the maximum number which may be confined under one 
gate that is at a higher voltage than its neighboring gates. Charge is trapped as long 
as the channel potential under one gate is higher than that of its neighbors. As seen from 
equation (11.23), (j) max is a decreasing function of W c h = N e /N£>. The maximum value of N e 
is therefore the one for which (j) max is equal to the channel potential under the neighboring 
gate. 

Let V g i denote the voltage on the neighboring gate, which has zero signal charge, 
(W c h = 0), and let V g 2 denote the voltage on the gate containing charge — qN e;max . From 
equation (11.23), we obtain 



qtox v qNp 2 

^ox ^^Si 



4>max\ = V g i ~ Vfb + ~ N D X maxl + -^^X maXl (11.26) 

and 



9max2 = V g2 ~ Vfb H N D X max2 + x max2 (11.27) 

tox \ Nr> J Zesi \ 1\d J 

Equating the righthand sides of the above expressions and noting that equal (j) max implies 
equal x max , as seen from equation (11.24), we obtain 
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eff 



Although it appears from this equation that N e;max is linearly related to the difference 
in gate voltages, it should be remembered that the depletion capacitance C,i\ depends both 
on the size of the signal charge packet and on the value of V g \ through its influence on x max . 
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11.3 Charge Transfer and Clocking 

The primary usefulness of CCDs is derived from their ability to move signal charge from 
one location to another. Raising the voltage on one gate and lowering that on a neighboring 
gate shifts the position of the potential well and also moves any charge contained in it. By 
appropriately sequencing the gate voltages, it is not only possible to transfer charge through 
a long array, but also to perform arithmetic operations, such as adding charge packets or 
dividing one packet into several others. For now we will focus on the clocking strategies 
for charge transfer, since other operations are based on permutations of the fundamental 
transfer sequence. 

For best transfer efficiency, there should be little or no spacing between adjacent gates 
so that the lateral potential in every portion of the channel is always directly controlled. In 
order to maintain electrical isolation, two layers of polysilicon separated by a thin oxide are 
generally used for alternating gates. The clocking sequence for charge transfer depends on 
the number of independent clock signals connected to the gates. CCDs have been designed 
using two-, three-, and four-phase clocking schemes [94]. The two- and three-phase methods 
have the advantage of using fewer clocks and allowing higher device density than four- 
phase clocking. Two-phase clocking, however, requires a special implant and only allows 
charge transfer in one direction [92]. Three-phase clocking allows bi-directional transfer 
but necessitates connecting the same clock signals to different polysilicon layers, making 
it impossible to adjust the signals to overcome threshold mismatches between first- and 
second-level poly. Since, as will be seen in the following chapters, the operations required 
by the MSV algorithm necessitate bi-directional charge transfer as well as adjusting for 
threshold mismatches, a four-phase method was used in this design. 

The charge transfer sequence using four-phase clocking is illustrated in Figure 11-7. At 
the beginning of the sequence, the signal charges are held under the gates labeled <j)\ and 
4>2- In the next clock cycle, the signal fa is brought high as <j)\ is brought low, causing the 
charge packets to spill into the empty potential wells created under the gates connected to 
(f>3 and move away from the lower potential regions now under <j)\. In the following cycles 
the process is repeated by raising and lowering the pairs ^4-^27 4>i~4>'ii an( i <^2 _< ^4 5 a t which 
point the sequence repeats. 
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Figure 11-7: Charge transfer sequence with four-phase clocking. 
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11.4 Transfer Efficiency 

One of the most important characteristics of charge- coupled devices is their charge 
transfer efficiency, e, defined as the fraction of the total charge transferred to the receiving 
well in one clock cycle. A related quantity, the transfer inefficiency tj, given by 

r]=l-e (11.29) 

is the fraction of total charge left behind. 

In order for large CCD arrays to be useful for analog processing, transfer efficiencies 
greater than e = 0.99999 are required. The consequences of poor transfer efficiency can be 
easily understood from a simple calculation. After transferring a packet of initial size Qq 
through N gates, the final packet size, Qn is given by 

Qn = Qoe N (11.30) 

With N = 1000 and e = 0.99999, Qn/Qo = 0.99, implying that by the time the charge 
packet reaches the end of the array, the signal will be diminished by 1% of its original value. 
With e = 0.9999, however, we would have Qn/Qo = 0.90, resulting in a 10% loss in the 
signal. 

Unless there is recombination in the channel, which should not occur to a significant 
extent in a well-designed device, the charge is not actually lost but is dispersed over the 
array, causing successive packets to be contaminated by the charge left behind. It can easily 
be seen that the amount of charge left under gate i, counting from i = initially, is given 
by the binomial formula [93] 

o~o = ( T ) £,(i " e)N ~ l (n - 3i) 

There are three primary mechanisms causing poor transfer efficiency. The first is charge 
trapping by interface or bulk states. As previously explained, one of the primary advantages 
to using buried channel devices is the fact that the density of bulk states is much lower 
than that of the interface states at the Si-SiOz surface, resulting in much higher transfer 
efficiencies in BCCDs than in surface channel devices. 
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The second cause of poor charge transfer is a 'bumpy' channel. The lateral potential 
profile between the gate spilling the charge packet and the gate receiving the packet does 
not necessarily increase monotonically. Potential 'bumps', which can keep some amount 
of charge from reaching the neighboring well, can occur when the inter-gate spacing is 
too large, when there are changes in the channel width, or when there are corners in the 
channel [86]. In addition since, as seen from equation (11.23), (j) max is directly related to 
t ox , potential bumps can be caused by variations in the oxide thickness over the length of 
the transfer gate. 

The third mechanism affecting charge transfer is clock frequency. Intuitively, it is clear 
that for the charge to be completely transferred, it has to be allowed enough time to reach 
its destination. Quantitatively, the time needed for complete charge transfer in the absence 
of traps can be determined by analyzing the forces driving the charge packet. 

When the empty potential well is created next to the charge packet by raising the voltage 
on the neighboring gate, the initial force pushing the charge into the new well is the mutual 
repulsion of the negatively charged electrons which generates a self-induced field, E s {. As 
the electron concentration decreases in the lower potential region being emptied, the self- 
induced field becomes less important and the fringing field, Eff, becomes dominant [92], 
[93]. The fringing field is created by the lateral influence of the neighboring gates which 
are at a high voltage on the charge remaining in the emptying well. Because this influence 
increases as the effective capacitance, C e //, between the charge and the gate directly above 
it decreases, the fringing fields are larger in buried channel than in surface channel devices, 
further enhancing their transfer efficiency. The final stage of charge transfer, after the self- 
induced and fringing fields have become negligible, is dominated by thermal diffusion which 
is the slowest of the three transport mechanisms [92]. 

Determining the exact time- varying charge distribution during transfer requires numer- 
ical solutions to the time-dependent Poisson's equation and continuity conditions. Approx- 
imate solutions, however, have been derived in references [93] and in [95]. Citing the results 
from the first reference, which give somewhat more insight into the nature of the solution, 
the transfer inefficiency after time t due to self-induced drift is 

"■'"'"■t+W (11 ' 321 
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where 
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with L being the length of the transfer gate and fj, n the electron mobility in the channel. 
The transfer inefficiency due to fringe-field drift after time t is given by 

Vff(t) * e-^ff (11.34) 

with 

l " * On IF ■ I (11 - 35) 

The minimum strength of the fringing field, |i? m ^|, is given by 

\ Emm \ = ^Yi. ( e -*esi/c eff L _ e -*e Si /3C eff L\ (n _ 36) 



where AV g is the difference in the gate voltages on the discharging and receiving gates. 

From equations (11.33), (11.35), and (11.36) it is clear that the characteristic transfer 
times increase at least as fast as the square of the gate length, L. Minimizing this parameter 
is thus crucial to designing a processor with both adequate speed and transfer efficiency. 

11.5 Power Dissipation 

11.5.1 On-chip dissipation 

Because signal charge moves through the CCD array, there is an effective current flowing 
across a resistive medium causing power to be dissipated on chip. The dissipation per gate 
is given by [93], 

^gate 



7 (J - E) 


-Q(v-E) 


Q (JcLf 



(11.37) 

fJ"n 

where J is the current density, E is the lateral electric field, Q is the charge under the gate, 
v is the average charge velocity, f c is the clock frequency, L is the gate length and fj, n is the 
carrier mobility. 
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To obtain a rough estimate of the on-chip power dissipation, we can set fi n = 1500cm 2 / 
V-sec, f c = 5MHz, L = 10/um, and Q = —6.4 X 10~ 14 coulombs, which is the charge of 
400000 electrons, giving 

P 3ate = 10- 9 W (11.38) 

Even if the array contained 10 6 gates, the total on-chip dissipation due to charge transfer 
would be no more than lmW. We can thus for all practical purposes ignore this contribution 
to the total power required to operate the processor. 

11.5.2 Power dissipation in the clock drivers 

The primary source of power dissipation in operating a CCD array is in the clock drivers 
which must supply the current to charge and discharge the large capacitive loads from all 
of the gates tied to a given clock phase at each cycle. For a square- wave signal the energy 
dissipated in the internal resistance of the clock driver when either charging or discharging 
a capacitance C is given by [93], 

E clock = \cV 2 (11.39) 

where V is the voltage swing on the capacitor. Since each gate is charged and discharged 
once per transfer cycle, the power dissipated per gate is 

Pgate = CV 2 ft (11.40) 

with ft being the transfer cycle frequency. 

A 100/um 2 gate with nominal capacitance of 0.5fF///m 2 operating with a 5V swing and 
1MHz transfer cycle frequency thus requires 

P ga t e = 1.25 X 10" 6 W (11.41) 

to be dissipated in the driver circuit. Operating a 256x256 four-phase array would therefore 
consume at least 82mW. 

Power dissipation in the supporting circuitry can be reduced by using tuned sinusoidal 
drivers [93]. In these circuits, the total power required to drive each gate is 



Pgate = ^CV 2 ft (11.42) 



Qf 
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where Qf is the quality factor of the driving oscillator. With well-tuned circuits, power 
dissipation in the clock drivers can be reduced significantly. 

11.6 Charge Input and Output 

In order to interface other circuits to a CCD processor, signal charge must be introduced 
into the array at some point and the stored charge must later be converted into a usable 
output signal. This chapter is thus concluded by discussing the specific I/O structures used 
in the MSV processor. 

11.6.1 Fill-and-Spill input method 

Input charge may be generated either electrically or optically. Optical input is of course 
necessary for image sensors, however, in a test circuit it is difficult to control. In designing 
the prototype MSV processor, an electrical technique known as the fill- and- spill method 
was used to generate the repeatable input signals required for testing the chip. 

The fill- and- spill method exploits the relation (11.28) between maximum signal charge 
and the difference in neighboring gate voltages. The process is illustrated in Figure 11-8. 
Voltages V re f and V{ n are applied to two adjacent gates creating the relative potential profile 
shown, while the potential barrier to the right of the input gate is created by holding the 
stop gate (SG) at a lower voltage than V re f. 

Signal charge is supplied via the ohmic contact made to the n-\- diffusion adjacent to 
the reference gate. In the fill stage, the diffusion potential, controlled by Vd, is lowered 
causing electrons to flood the higher potential regions beneath both the reference and input 
gates. On the next clock cycle, Vd is raised so that the diffusion potential is well above 
the reference channel potential, causing excess electrons to spill back into the diffusion and 
leave behind a charge packet Q of size, theoretically, given by 

Q = (V m -V ref )C m (11.43) 

where C ra is the total capacitance of the input gate. Following the spill operation, the signal 
charge can be moved into the array by appropriately clocking the stop gate and transfer 
gate voltages. 

Several variations on the basic structure shown in Figure 11-8 have been used with CCDs. 
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Figure 11-8: Fill- and- spill method for charge input 
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Since it is often useful to maintain a linear relation between the charge packet size and the 
voltage difference (Vi n — V re f), fill- and- spill structures are sometimes built from surface 
channel devices [86]. To avoid threshold mismatches, the reference and input gates are 
usually designed in the same level of polysilicon. This necessitates placing either a floating 
diffusion or a dummy gate in second-level poly [65] to maintain the lateral continuity of the 
channel. In the MSV processor, threshold mismatch in the input stage was not a concern as 
the input test signals were completely controllable and any effects due to mismatch could 
be removed by adjusting the input level. In the present design, it was thus simpler to use 
different polysilicon levels for the input and reference gates, as shown in the diagram. 

The primary difficulty in designing a fill- and- spill device to produce consistent output 
levels for given values of Vi n and V re f is that the charge packet size is in fact a random 
variable. First, the amount of charge stored in the capacitor under the input gate will 
fluctuate due to thermal noise, with the mean value of the fluctuations given by [96] 



AQ = y/WU~ (11.44) 

where k is Boltzmann's constant, and T is the absolute temperature. The ratio of noise to 
the total signal is thus 

AQ = 1 [K (n.45) 

Q *in ~ * re f y C m 

A second problem causing the signal charge to fluctuate is thermionic emission. This is 
the problem discovered by Keast [86] which limited the resolution of the CCD-based absolute 
value of difference circuit. This circuit, which is essentially composed of two cross-coupled 
fill-and-spih structures, has a dead zone for small input differences due to the fact that 
small potential barriers are not very effective in holding back the signal charge. Thermionic 
emission is caused by the finite kinetic energy of the electrons. Just as water will boil over 
the side of a pot filled to the rim, the kinetic energy of electrons at the 'top' of a potential 
barrier can give them enough boost to jump over the side. 

In the fill-and-spih structure, the barrier under the reference gate is used to block elec- 
trons from spilling back into the high potential diffusion. Because of the thermionic effect, 
however, fewer electrons than predicted by equation (11.43) will remain under the input 
gate. Given sufficient time, the charge level will drop until the potential difference under 
the input and reference gates is large enough to overcome the average kinetic energy of 
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the electrons. If the difference between V{ n and V re f is small, however, this may not occur 
before all of the signal charge is lost. 

One solution to limiting the effect of thermionic emission is to run the input process 
fast enough so that most of the energetic electrons do not have time to jump the barrier. 
This idea was used by Keast to improve the resolution of the absolute value of difference 
circuit. It is not a good idea, however, for operating the input circuit if stable signal levels 
are desired, since the amount of charge lost per unit time is not a well-controlled quantity. 
For best stability, the fill- and- spill structure should be operated slowly so that the amount 
of charge will decrease to the level, which is a function only of the temperature T, where 
the emission current is negligible. 

11.6.2 Floating gate amplifier output structures 

Sensing the amount of charge in a given packet is usually performed by injecting it onto 
the 'floating' plate of a precharged capacitor and measuring the resulting change in voltage. 
Typically, the sensing capacitor is either a gate or a diffusion connected to a high input 
impedance buffer. In the present design, the floating gate technique was used exclusively 
because it allows for non-destructive sensing, which is necessary since measurements must 
be made at several different stages. In addition, the floating gate structure allows greater 
sensitivity, lower noise, and better matching than the floating diffusion structure [97]. 

A diagram of a floating gate structure placed in a CCD array is shown in Figure ll-9a. 
A reset transistor initializes the gate to a voltage Vi and is then turned off, disconnecting 
the gate from the clock signal. Charge is injected into the potential well under the floating 
gate by clocking the other gates as described in Section 11.3. The change in voltage, 
AV g = Vf — Vi, is buffered by a source follower which provides a high impedance connection 
to the gate and sets the load capacitance to a fixed value. 

The relation between signal charge, Q s i g and AV g , is best understood using the lumped 
capacitance model of Figure ll-9b. C ox , C'di, and C'd2 are as defined in Section 11.2 while 
Cioad represents the combined normalized capacitance from the source follower input gate, 
the drain diffusion of the reset transistor, the overlap of the adjacent gates, and parasitic 
sidewall capacitances. 

^-v ^sf T ^ drain T ^ ovl T ^ sw /-, -, , r ,\ 

Cload = — -: (11.46) 

A g 

Normalization by the gate area, A g , is necessary to maintain consistency of units. 
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Figure 11-9: Charge sensing using the floating gate technique 
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The channel charge is divided between the capacitors C e // and Cd2, whose common 
'plate' is at the potential maximum (j) max . If charge Q s i g is injected into the channel, it will 
also divide between the two capacitors such that 

Qsig=Qeff + Qd2 (11-47) 



where 



and 



Qeff = C e ff (A<f> max - AV g ) (11.48) 



Qd2 = C d2 A^ max (11.49) 

The quantity A(f> max represents the change in the maximum channel potential, while (Acj) max - 
AV g ) is the change in voltage across C e // as seen from the 'bottom plate'. 

The charge Q e ff on the bottom plate of C e // must be mirrored by an equal charge of 
opposite polarity, —Q e ff, on the top plate. Since the total charge shared between the top 
plate of C e ff and C\ oa d is constant, the charge on C\ oa d must increase by +Q e ff 

AQl 0ad = +Q ef f (11.50) 

so that 

AV g = 9lIL (11.51) 

Woad 

Combining (11.48), (11.49), and (11.51) we obtain the following expression for Q,ii in 
terms of Q e ff- 

Qd2 = C d2 Q eff (J- + -J—) (11.52) 

\Ceff Cl oad 



and taking equation (11.47) into consideration, we obtain the desired expression 

AV g = QiL (— ^ ——) (11.53) 



In designing a floating gate amplifier, we normally want as much voltage swing at the 
output as possible. The final voltage on the gate, Vf, is limited however by the charge 
capacity equation (11.28). If we let V\ represent the low clock voltage which is applied to 
the gates neighboring the sense node and let Q max = —qN e;max represent the maximum 
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signal, then from (11.28) we must have 

Q^ ' e,max 



Vf-Vi > 



C eff 
^C max 



C eff 

To simplify notation, let a denote the quantity in parentheses in equation (11.53) 



11.54) 



_ 1/Crf2 , 

1/Cd2 + 1/Ce// + l/Cl oa d 

such that the minimum final gate voltage, Vf tm i n , is given by 

V Lmm = Vt + a 9p^ ( 11.5 6) 

Woad 

We can then eliminate Q max from equations (11.54) and (11.56) to obtain 

T/ CloadV, + a C e ffVl 

Vf,min = —7T, TT, (11.57) 

^load + a< ^eff 

For given values of Vi and Vi, we can design the floating gate for a desired Vf t7n i n by 
adjusting C\ oa d- The smaller C\ oa d is with respect to aC e ff, the closer Vf t7n i n will be to 
V\. We do have to be careful, however, to not make C\ oa d too small as its value also affects 
the maximum charge size, and therefore the signal to noise ratio. Eliminating Vf from 
equations (11.54) and (11.56), we obtain the following expression for N e;max 

AT Hmax 1 / *i~ VI \ , . 

N e ,max = = ~ [ ~[7< 7TTF< — (11.58) 

q q \a/Ci oad + 1/G e ff J 

As G'i oa d ^ so does N e , max . 
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CCD Edge Detector Design 



The floor plan of the complete MSV edge detection processor is shown in Figure 12-1. 
There are four basic operations performed by the processor: 

1. Charge input, 

2. Smoothing, 

3. Computing the magnitude of the difference between values at neighboring nodes, and 

4. Storing and reading out the binary edge signals. 

A fill- and- spill structure placed at the top of the block marked 'Charge Input and Shift 
Register' is used to load each pixel of the image, converting the brightness values to signal 
charge. An image is loaded into the array one column at a time, using the vertical shift 
register to move the pixels to their appropriate rows. Once the last pixel of the column has 
been read in, the contents of the entire shift register are transferred horizontally into the 
processing array. 

Both the smoothing and differencing operations are performed within the array. Edge 
charges are stored separately at each cell for every horizontal and vertical node pair. They 
are not coalesced on-chip into one edge signal per node, as described at the end of Sec- 
tion 5.3, in order to leave more flexibility in the design of external circuits which interface 
to the processor. The horizontal and vertical edge signals for one row, which is selected 
by the decoder block to the right of the array, can be output in parallel at the end of any 
smoothing/differencing cycle without disrupting the proper operation of the algorithm. 
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Figure 12-1: Floor plan of the complete MSV processor. 

In the following sections I will describe the design, operation, and layout of the circuits 
used in the prototype MSV processor fabricated through MOSIS. I will first discuss the 
architecture of the unit cells which compose the array and then present the detailed design 
of the CCD- and transistor-based processing elements. Fest results from the fabricated 
circuits are presented in the next chapter. 

12.1 CCD Processing Array 

A block diagram of the unit cell is shown in Figure 12-2, with the corresponding layout, 
measuring 224A X 224A 1 , shown in Figure 12-3. At the boundary of the cell are the CCD 
gates in alternating levels of polysilicon which are sized so that when cells are abutted to 
form the processing array, the gate structure seen in Figure 5-3 results. Signal charges 
proportional to the pixel brightness values are stored under the large gates at the corners of 
the cell. One floating gate amplifier (FGA) per cell senses the charge under the gate in the 



In the Orbit CCD process A = l(im. 
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Figure 12-2: Unit cell architecture. 
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Figure 12-3: Unit cell layout. 
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Figure 12-4: Floating gate design. 



lower lefthand corner and feeds the output voltage to the four differential amplifiers which 
are paired with its nearest neighbors. For reasons dictated by the layout, it was simplest to 
have the floating gate amplifier communicate with the differencing circuit directly adjacent 
to it and to those in the neighboring cells to the west and southwest. Fhe edge signals 
stored in the blocks marked 'Horizontal' and 'Vertical' thus correspond to edges between 
the two nodes at the base of the cell and the two on the righthand vertical side, respectively. 

12.1.1 Charge sensing 

Fhe more detailed picture of the floating gate amplifier used for charge sensing at the 
signal nodes is shown in Figure 12-4. Fhe clock phase <j)\ is gated through the p-type reset 
transistor controlled by the signal Vf g . When Vf g is brought low, the node gate voltage 
is controlled by <j)\ for the purposes of charge transfer and storage. When Vf g is brought 
high, however, the gate is left floating and can thus be used for measuring charge levels as 
described in Section 11.6.2. 

In order to sense the signal level, the node must be initially emptied by transferring 
the charge out through the four connecting branches. Once this is done and <j)\ is brought 
high, Vfg is also brought high, turning off the reset transistor and initializing the node gate 
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Parameter 


Value 


Units 


N D 


3.8 X 10 16 


cm~ 3 


N A 


5.05 X 10 15 


cm~ 3 


<"0X 


420 


A 


Vfb 


0.6 


V 


X n 


0.4 


/im 


C 

^ ox 


8.10 X 10" 16 


F/fim 2 


c 

Vj V\~V2 


5.0 X 10" 16 


F / ' iim 2 


^ jswO 


3.46 X 10" 16 


~F/fj,m 


4>u 


0.8 


V 



Table 12.1: Orbit CCD/CMOS process parameters (from 10-28-93 run). 



voltage to the value V{. The signal charge is then returned and dumped back into the empty 
potential well, causing the floating gate voltage to change to the value Vf, where 



V f = Vi + 



Q 



sig 



1/C, 



d2 



C'load \ 1/Cd2 + 1/Ce// + l/Cl oa d 



'12.1) 



The minimum value which Vf can attain is, from equation (11.57), 

_ CloasVi + a C e ffVl 



Vf, 



where V\ is the low clock voltage and 



a 



C'load + OL C e ff 



i/c d2 



'12.2) 



1/Cd2 + 1/Ce// + 1/Cload 



'12.3) 



In the test system designed to evaluate the prototype processor, the clock drivers were 
operated between voltages Vh = 4.5V and V\ = 0.6V. Targetting a full scale swing of 2V, 
i.e., Vf ;m i n = 2.5V, we can compute the nonlinear depletion capacitances C,i\ and C,ii with 
N e = N e;max from the equations developed in Section 11.2 using the values of the Orbit 
process parameters given in Table 12.1. From equation (12.2) we then obtain the required 
load capacitance, C\ oa d- The results are given below in Table 12.2. 

The voltage change on the floating gate is measured via a source-follower buffer whose 
design was determined by two primary considerations. The first was the need to minimize 
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Parameter 


Value 


Units 


N 

1 v e,max 


3272 


electrons //im 2 


tPmax 


4.20 


V 


%max 


0.27 


lira 


C'dl 


4.6 X 10" 16 


F / lira 2 


C'd2 


9.1 X 10" 17 


¥ 1 lira 2 


C 'eff 


2.9 X 10" 16 


¥ 1 lira 2 


Cload 


1.4 x 10" 16 


¥ 1 Jim 2 



Table 12.2: Floating gate parameter values for Vg = 2.5V and N e = N e 



the loading capacitance on the floating gate, while the second, and most critical, was the 
need to have the output voltage in the correct range for interfacing directly to the differential 
amplifiers. For the latter reason, an ra-type source follower was used, despite the fact that 
a higher gain could be achieved with a p-type design with separate wells connected to the 
sources of the bias and input transistors. 

Within these considerations, it was of course desirable for the gain to be as high as pos- 
sible. The theoretical small signal gain of the ra-type configuration, as shown in Figure 12-4, 
is given by [98] 

V_o_ = ffm (12 _ 4) 

Vi 9m+ gmb + 1/Reff 

where g m and g,^ are the small signal gate-source and source-bulk transconductances, 
respectively, of the input transistor and R e ff is the parallel combination of the input and 
bias transistor output resistances. Since little can be done to reduce g m b, maximizing the 
gain involves making R e ff as large as possible compared to l/g m . 

To increase the drain resistances, both transistors were drawn with long channels to 
reduce channel length modulation, and the W/L ratio of the bias transistor was dimensioned 
to make the drain current, Id, small. Since the output resistance is proportional to 1/Id 
while g m increases as a//d [98], the product g m R e ff increases when the drain current is 
lowered as 1/yHjj. The W/L ratio of the input transistor, on the other hand, was determined 
by the output voltage range needed by the differential amplifiers. The sizes used in the final 
design were W/L = 6A/4A for the input transistor and W/L = 4A/8A for the bias transistor. 

Figure 12-5 shows the simulated behavior of the source follower with a bias voltage 
°f Vbias = IV based on the Orbit process parameters from the 10-28-93 run. From the 
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Floating gate source follower characteristic (design simulation) 




Figure 12-5: Simulated source follower characteristic for floating gate amplifier (Vbi as 
IV). 



simulations, Itj is computed as 63nA, giving a static power dissipation of 313nW for Vdd = 
5V. The predicted DC gain is v /v, = 0.800, with a 3dB bandwidth of 800kHz for a 600fF 
load approximating that of the differential amplifier input gates. For the maximum and 
minimum inputs of 4.5V and 2.5V, the output voltages are 2.85V and 1.22V, respectively, 
giving a predicted full scale swing of 1.63V. 

The normalized load capacitance, C\ oa dt on the floating gate is computed from equa- 
tion (11.46) by summing the contributions from all loads and dividing by the total gate 
area. The dimensions of the node gates were 30/2171 X 30/ira, with 40//ra X 2/ira of over- 
lap area (2/ira being the minimum polyl-poly2 overlap width allowed in the Orbit design 
rules) and 80/ira, of sidewall perimeter. The capacitances for each element loading the gate 
are computed and given in Table 12.3, below. The sidewall capacitance, C sw , which is a 
function of the voltage between the buried channel and the substrate, was computed at the 
minimum channel potential of 4.2V, for which the unit capacitance C JSW = .14 fF///ra. The 
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total capacitance from all sources is thus estimated at 82.61T, giving 

82.6/F 



^ loaa 



9.2 x 1(T 17 F/ fim 



30/2171 x 30/2171 
which is comfortably below the value of 1.4 X 10~ 16 F/ /im 2 required for a 2V swing. 



'12.5) 



C s f 


input gate 


16.5 fF 


gate-source 


13.2 fF 


total 


29.7 fF 


^ drain 


reset transistor 


1.6 fF 


C vl 


80/im 2 X C Pl - P2 


40 fF 


c 

^ sw 


80/i X C JSW (4.1V) 


11.3 fF 


Total 




82.6 fF 



Table 12.3: Capacitances loading the floating node gate. 



The number of electrons corresponding to the maximum signal is found by multiplying 
N e ,max f° r the minimum final voltage of 2.5V by the gate area, A g . From Table 12.2, we 



have N P 



3493 electrons/^m 2 , giving a total of approximately 3.1 X 10 6 electrons. 



12.1.2 Clock sequences 
Charge transfer and smoothing 

Six different clock phases were required to operate the MSV gate array. Only four 
are needed to move charge laterally across the array when loading or unloading an image. 
However, for the smoothing operation the motion is along all four branches connected to 
each node and two more clock phases are required to control the direction of charge flow. 

For simple lateral transfer a pseudo four-phase clocking scheme was used. Charge move- 
ment in this scheme is identical to that described in Section 11.3, however, the clocking 
sequence is more complicated because the phases are not arranged in a simple 1-2-3-4 re- 
peating pattern. The clock signals connected to each gate are shown in Figure 12-2. When 
the charge is held under the node gate, the signal <j)\ is high and Vf g is low, as are the 
signals (fe, <^5, and (f>Q connected to the gates neighboring the node. To move charges from 
the nodes on the left side of the unit cell to those on the right, Vf g and (f>Q are held low and 
the following sequence is executed: (<f> 2 |), (<h I, fo T), {h I, 4>i T), (fo I, <h T), (<^4 I, h T), 
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Figure 12-6: Charge averaging operation. 
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(0!|, fcf), (02 |, 05 T), (03 I, 01 T), (05 I, 04 T), (01 I, 03 T), (04 I, 05 T), (03 I, 01 T), (05 !)• 

The up arrow, f , is used to imply that the corresponding signal is brought high, while the 
down arrow, j, indicates that it is brought low. 

The 2-D smoothing operation is performed in two passes by sequentially executing a 
1-D smoothing operation along the horizontal and vertical branches. Each 1-D operation 
consists of four steps: (1) splitting the charge held under the node gates into two equal 
packets, (2) moving the packets out the branches connected to the node towards the mixing 
gate, (3) averaging the packets from adjacent nodes, and (4) returning the averaged packets 
back to the node gate where they are added together. Splitting is performed when the 
charge is entirely confined under the node gate (Vf g low and 0i high) by first raising the 
signals on the adjacent gates (02 and 05 for horizontal smoothing; 06 for vertical smoothing) 
which causes the charge to distribute itself evenly over the high potential regions under the 
three gates. Bringing 0i low then divides the charge into two equal and isolated packets. 

For the horizontal smoothing operation, 03, 04, and 06 are held low during the splitting 
phase. Executing the sequence (0 3 |), (0 2 j, 5 j, 04 T), (03 I, 01 T), (04 I, 02 T, 05 T then 
moves the charge packets away from the node gate along the horizontal branches and creates 
the situation shown at the top of Figure 12-6, where the packets from two adjacent nodes 
are about to collide. The central gate controlled by 03 on each of the branches is referred to 
as the mixing gate. The two unequal packets from the neighboring nodes, initially separated 
by the barrier at the mixing gate are added together when 03 is raised and 0i is brought 
low. When 03 is subsequently lowered and 0i is brought high, the summed charge is then 
divided in two. The averaged packets are then returned to their respective nodes, where 
the results from the opposing branches are combined, by executing the inverse sequence of 
that used to move them away. 

It can easily be shown that the horizontal operation is equivalent to convolving the 
stored image with the discrete kernel 

\ [ 1 2 1 ] (12.6) 

while the vertical operation, which is performed in a similar manner by appropriately sub- 
stituting 06 for the signals 02 and 05 in the clocking sequence, is equivalent to a convolution 
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with 



;i2.7) 



1 

16 


1 2 1 

2 4 2 




1 2 1 



The complete smoothing cycle, which consists of performing first the horizontal and then 
the vertical operations, is thus equivalent to a convolution with the 2-D binomial kernel 



'12.8) 



Each successive smoothing cycle repeats this operation so that performing n cycles 
results in the convolution of the original image with the (n — l)th convolution of this kernel 
with itself. For instance, two cycles corresponds to a convolution with 



14 6 4 1 

4 16 24 16 4 

6 24 36 24 6 (12.9) 

4 16 24 16 4 

14 6 4 1 



1 
256 



The size of the smoothing filter, which is controlled by the number of cycles performed, can 
thus, theoretically, be made arbitrarily large. 

Avoiding backspill 

One problem which arises from using two polysilicon levels in the gate array is that the 
channel potentials are not the same for equal gate voltages applied to both levels. For the 
Orbit process, the difference in the polyl and poly2 channel potentials is approximately 
0.4V, with poly2 being higher. The consequence of the potential mismatch is that it can 
result in incomplete transfer due to backspill. A more accurate depiction of the potential 
levels along the array during charge transfer is illustrated in Figure 12-7. When a polyl 
gate is brought low, a small amount of charge can be transferred backwards by being pulled 
into the slightly higher potential region under the adjacent poly2 gate on the other side. 

To avoid this problem, while keeping the clock driver circuits simple, a special level shift 
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Figure 12-7: Backspill caused by potential mismatch. 
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Figure 12-8: Level shift circuit for correcting potential mismatch problem. 
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circuit, shown in Figure 12-8, was added on chip to the poly2 clock lines. When the clock 
phase is low and the <j)Q signal is brought high, the clock line is pulled all the way to ground. 
Since the low clock voltage used in the test system was V\ = 0.6V, this causes the channel 
potential under the poly2 gate to be slightly higher than that under the polyl gate which 
is being brought low, and thus prevents backspill. In the following clock cycle when the 
poly2 gate is brought low, <j)Q is held low so that the potential under the poly2 gate at V\ 
will be slightly higher than that of the polyl gate behind it. As a result, there is always 
some small barrier preventing charge from travelling in the wrong direction. 

12.1.3 Boundary processing and charge input 

The cells on the array boundary are different from those of the interior, primarily because 
they are missing one or more neighbors. In addition, the cells along the west boundary of 
the array contain the charge input structure and the vertical shift register for moving the 
input signals to the appropriate row. Only the north and west boundary cells, shown in 
Figures 12-9 and 12-11, contain differencing and threshold circuits — the north cell for the 
horizontal pair at the base of the cell, and the west cell for the vertical pair on the right 
side of the cell. 

Smoothing on the boundary cells is inhibited in the direction away from the array by 
keeping the gates normally used for mixing at a low voltage. For the north and south cells, 
these gates are simply grounded, while on the east and west cells, they are connected to a 
clock signal, <$>sg (f° r stop gate), which is held low during the smoothing operations. An 
independent signal is necessary for the stop gates in the east and west boundary cells since 
they must be clocked to allow charge to be moved into and out of the array. 

When loading and unloading images, 4>sg is driven identically to fa in the lateral transfer 
sequence. An image is loaded column by column, shifting the entire contents of the array 
one node to the east as the next column is entered. Any charge previously held in the array 
is also moved, and in particular, the charge held under the node gate of the east boundary 
cell, shown in Figure 12-10, is dumped out into the high potential n-\- diffusion next to 
the stop gate. In order to have the option of measuring the signal levels of the smoothed 
image as it is removed from the array, the floating gate amplifier outputs on the east cells 
are connected to an output pad driver and can thus be sensed off-chip. To use this option, 
the horizontal transfer sequence must be modified to include turning off the reset transistor 
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Figure 12-9: Boundary cell design (north). 
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Figure 12-11: Boundary cell design (west) 
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Figure 12-12: CCD input structure. 




Figure 12-13: layout of the fill- and- spill structure. 
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and allowing the node gate to float. 

Charge input occurs in the northwest corner of the array using the fill-and-spill structure 
diagrammed in Figure 12-12, with the layout given in Figure 12-13. The operation of this 
structure is exactly as described in Section 11.6.1, except for the names of the clock phases. 
Signal charge is supplied via the n-\- diffusion at the top of the structure. The reference 
voltage, V re f, is connected to the polyl gate next to the diffusion, while the input voltage, 
Vi n is connected to the large, 30/2171 X 30/2171, poly2 gate. The area of the input gate is the 
same as that of the node gates within the array to ensure that enough electrons can be 
supplied to achieve the full range of the floating gate amplifiers. 

The vertical shift register on the west boundary cells is clocked using a standard four- 
phase method with the signals <f) s i, <f) S 2, (j) S 3, and (f> S 4, and is connected to the fill-and-spill 
structure via the two transfer gates clocked by TG\ and TG2, and the stop gate controlled 
by SG. After the fill-and-spill operation, the input charge is transferred into the shift 
register by clocking TG\, TG2, and (f> S 4 high and raising the stop gate voltage, SG, to an 
intermediate level. In order to achieve complete transfer without spilling charge back into 
the input gate, two conditions must be satisfied. The first is that the potentials under the 
input gate, stop gate, and transfer gates must be cascaded such that 

<t>in < <t>SG < <t>TG l (12.10) 

while the second condition is that the maximum signal charge must be contained entirely in 
the potential well under the two transfer gates and the first shift register gate controlled by 
(f> S 4. Since the combined area of these gates is 7 56 /im 2 , the electron density per /1m 2 of gate 
area for a maximum signal of 3.1 X 10 6 electrons is N e = 4100 (erectrons/^TO 2 ). From the Or- 
bit process parameter values given in Table 12.1 and the equations developed in Section 11.2, 
the voltage difference between the transfer and stop gates must be approximately 2V. With 
a high clock level of Vh = 4.5V, the maximum value of Vsg must therefore be < 2.5V, 
while the voltage difference between the input and reference gates, corrected for the 0.4V 
polyl-poly2 potential mismatch, must be 1.4V to input the maximum number of electrons. 
Condition (12.10) is thus satisfied for all inputs with V ref = 0.7V< V m + 0AV < 2.5V. 

The transfer into the shift register is completed by bringing Vsg back to its low level 
of 0V and then clocking TG\ { followed by TG2 j — 4>si |. The next clock sequence to be 
performed depends on whether or not the input value is the last pixel of its column. If so, 
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Figure 12-14: Multi-scale veto edge detection circuit block diagram. 

bringing (f> S 4 low positions the signal charge for entry into the top row of the array, so that 
executing the horizontal transfer sequence (with (f> s i initially clocked as 4>i) will move the 
column of data in the shift register to the first column of node gates. Otherwise, before the 
next pixel can be loaded, the charges in the shift register must be shifted down one row by 
executing four times the sequence: (cj) s4 j, (j) s2 |), (4>si I, (f> S 3 T), (<^s2 I, <^s4 T), and (cj) s3 j, 
•kit)- 



12.2 Edge Detection Circuit 

In detecting edges using the multi-scale veto rule, differences are computed between 
the values at neighboring pixels after each smoothing cycle, and an edge is signalled if 
and only if the magnitude of the difference is above the threshold specified for the cycle 
in each of the tests. The circuit implementation of the differencing, threshold, and veto 
operations is shown in block diagram form in Figure 12-14. The outputs of the floating 
gate amplifiers from two neighboring nodes are connected to the inputs of a double-ended 
differential amplifier with gain A. The two outputs of the differential amplifier are equal 
to V oc + AAV and V oc — AAV, where AV = V\ — V and V oc is the common-mode output 
when AV = 0. Since it is not known whether AV is positive or negative, the threshold 
test is performed by comparing both outputs to a voltage representing the threshold plus 
an offset to compensate for V oc as well as any systematic bias in the comparator. If the 
threshold voltage is greater than both +AAV and —AAV, the edge is vetoed by grounding 
the input to the storage latch. 

Since space is limited in the unit cell, it is not practical to duplicate the comparator 
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Figure 12-15: Clock waveforms for driving edge detection circuit. 

circuit to perform both tests simultaneously. Instead, the tests are performed sequentially 
with a single comparator by selectively gating the differential amplifier outputs using the 
clock signals R\ and Ri- The comparator output, which is high if the threshold voltage is 
less than the differential output, is also selectively switched to one of the inputs of the NOR 
circuit and is stored on the input gate capacitance when the switch is opened. 

The clock waveforms, including R\ and i?2, used in the edge detection circuit are shown 
in Figure 12-15. Signals R^ and R4 are used in the comparator circuit, which is discussed 
in Section 12.2.2. The signal SET is used to initialize the edge storage latch, discussed in 
Section 12.2.3, before starting the series of smoothing cycles and threshold tests, while the 
signal CG is used to gate the result of each threshold test, once it is valid, to the storage 
latch input. 

12.2.1 Double-ended differential amplifier 

The role of the differential amplifier is to generate a signal proportional to the difference 
in the input voltages which can be compared against a given threshold value. In order to 
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Figure 12-16: Differential amplifier circuit. 
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Figure 12-17: Simulated differential amplifier characteristic for common mode voltages 
V c = 1.2V, 2.0V, and 2.8V (V bms = IV). 
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meet the resolution requirements discussed in Chapter 10, it is necessary for the range of 
measurable differences to be between 2.5% and 10% of the full scale input range. Since the 
floating gate amplifiers are designed for a full scale swing of 1.6V, the range of distinguishable 
differences must be between at least 40mV and 160mV. 

If the differencing circuit is to be effective in detecting edges over the full range of input 
values, two more requirements must be satisfied. First, the differential amplifier must have 
a very high common mode rejection ratio so that a given difference AV corresponds to 
approximately the same output signal for all common mode input levels. Second, the gain, 
A, of the amplifier should be large enough to magnify the minimum input difference so that 
it is greater than the minimum resolution of the comparator circuit. On the other hand, 
A can not be so large that the amplifier output saturates when the input difference is less 
than the maximum value which must be measured. These output constraints translate into 
requiring the amplifier gain to be between ~ 2 and ~ 15. 

The circuit diagram of the differential amplifiers used in the prototype MSV processor 
is shown in Figure 12-16. It consists of two identical cascaded differential pairs with diode- 
connected p-fet loads. The common mode output and common mode rejection ratios of 
each pair are determined by the magnitude of the bias current and by the output resistance 
of the bias transistor. The small signal differential gain, A^, of each pair is equal to the 
gain of the half-circuit composed of a single ra-fet input stage with a p-fet load [98]. Let 



12.11) 




be the transconductance of the ra-fet input and 



g m2 = \l2K p (?P\ I D (12.12) 



\L ■ 



be that of the p-fet load, where Id is the drain current through both transistors, and the 
factors K n and K p are given by K n = fJ, n C 0X and K, p = fJ, p C 0X , with fj, n and fj, p being 
the respective electron and hole mobilitities. Since the output voltage is equal to the gate 
voltage of the p-fet, it is easily seen that 



Vid 9m 2 V Vp( W / L )2 
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The input and load transistors in each pair were sized such that (W/L)\ = 10A/4A 
and (W/L) 2 = 4A/8A. With nominal values for the Orbit process of K n = 46.9/^A/V 2 
and K p = n.O/^A/V 2 , the theoretical differential gain is thus A^ = 3.7. The advantage 
of cascading the two low-gain differential amplifiers is that the combined differential and 
common mode gains are the products of the individual terms. We thus obtain both a high 
differential gain and a high common mode rejection ratio with half the input capacitance 
loading the floating gate amplifier outputs. 

The simulated output characteristics of the differential amplifier using the Spice level 2 
parameters supplied by Orbit for the 10-28-93 run are shown in Figure 12-17 for common 
mode inputs of 1.2V, 2.0V, and 2.8V. The gain A^ of each pair determined by the simulation 
was 1.97, with a combined gain of 7.70. The difference between these values and that 
predicted by equation (12.13) is due to the use of a more accurate model which includes 
second-order effects, such as channel geometry and threshold variations, by the HSPICE 
simulation program. 

The fact that the two ouput voltages are symmetric about V oc = 3.67V only for input 
differences less than lOmV has no impact on the threshold test as only the range of the 
positive difference, V > V oc , is important. The output response shows good common-mode 
rejection for the first 70mV of input difference and only a slight dependence on the common 
mode level, which should not greatly affect the overall performance of the edge detector, 
for differences between 70mV and 200mV. 

12.2.2 Comparator circuit 

The comparator and dynamic NOR circuits used for the threshold tests are shown in 
Figure 12-18. The basis of the comparator is a standard clocked CMOS sense amplifier 
developed for measuring small voltage differences in memory circuits [99]. When the clock 
signal i?3 is low and R4 is high, one of the gates of the back-to-back inverters is precharged 
to the output value from the differential amplifier, while the other is precharged to the 
voltage representing the threshold value. When R3 is brought high, with R4 still high, 
the sources of the two ra-fets at the base of the sense amp are grounded. The gate of the 
transistor precharged to the higher of the two input voltages will pull more current initially, 
and will thus bring its drain voltage to ground more quickly, than the other transistor whose 
gate is precharged to the lower voltage. Since the drain of each transistor is connected to 
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Figure 12-18: Sense amplifier voltage comparator for threshold tests. 
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Figure 12-19: Sense amplifier response with resolution of 8 = lOmV. 
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the gate of the other, when one drain goes to ground it shuts off the opposing transistor, 
preventing its drain from discharging more. One clock tick after R% is brought high, and 
the drain-gate voltages on the transistors have settled, R4 is brought low, connecting the 
sources of the two p-fets at the top of the sense amp to Vdd- The process is then repeated 
in reverse such that the side which was not completely discharged when R3 was brought 
high is now brought all the way to Vdd- 

The sense amplifier thus produces two binary ouputs, one for each input, which are high 
when the correponding input is greater than the other one and low when the corresponding 
input is less. A real sense amplifier, however, has a finite resolution due to the \JkTC 
noise in charging the gate capacitances 2 and thus cannot measure differences in voltages 
that are arbitrarily close together. The resolution, 8, is defined for a given confidence level 
a such that the sense amplifier will produce the correct output with probability p > a 
when the magnitude of the difference in the two inputs is greater than 8/2. An example 
response characteristic for the present situation is illustrated in Figure 12-19 where the x- 
axis is plotted for A|AT^|, given V oc = 3.6V and assuming no systematic offset in the sense 
amplifier. In between the dashed lines, where V oc + A|AT^| < V T + 8/2 and V oc + A|AT^| > 
V T — 8/2, the probability that the sense amplifier output is correct is less than a. The 
resolution, 8, shown in the diagram as lOmV, is the horizontal distance between the two 
dashed lines and represents the minimum voltage difference which can be reliably measured. 

The sense amplifier output connected to the V T input side is fed into an inverter whose 
output is gated to one of the inputs on the adjoining NOR circuit. The inverter, which 
consists of a single ra-fet with a p-fet load clocked by R4, is used to isolate the sense amplifier 
from the uneven capacitance of the NOR input gates. The fact that the inverter input itself 
creates a capacitive imbalance between the two sides of the sense amplifier is unimportant 
as the base level of the threshold voltage V T can be adjusted to compensate for the resulting 
offset. The important issue is that the input capacitance to the inverter is constant, while 
that of the NOR circuit depends on the result of the previous test. An analysis of the NOR 
circuit shows that the gate-source capacitances, C gs , of the two p-fets are each a function 
of the charged state of the other transistors, as this determines whether or not there is a 
conducting path through the circuit during the precharge phase of the sense amplifier. 

Since the inverter output is low when V T wins the comparison, the NOR output will be 



See Section 11.6.1 for a discussion of charging noise. 
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Figure 12-20: Edge charge storage latch. 

high only if the threshold voltage, V T is greater than both V oc + AA"F and V oc — AAV. 
In this case the switch transistor at the bottom of the two transistor chain connected to 
the edge storage latch input is turned on. The second transistor, which is clocked by the 
signal CG, is turned on after both comparisons have been completed and the NOR output 
is stable. If the NOR output is high, the storage latch input is connected to ground, and 
the edge signal is discharged. If it is low, however, the lower switch is open, and raising CG 
has no effect on the state of the storage latch. 



12.2.3 Edge storage 

The edge storage latch, which consists of a pair of cross-coupled p-fets along with two 
ra-fets for initializing and discharging the edge signal, is shown in Figure 12-20. At the 
beginning of the multi-scale veto procedure, before any threshold tests are performed, the 
latch is initialized by bringing the SET signal high. This action turns on the lower ra-fet, 
bringing its drain voltage to ground, and thereby turning on the upper left p-fet which pulls 
the ra-fet gate all the way to Vdd- When SET is brought low, the positive feedback of the 
n-p transistor combination will maintain the charged state of the latch indefinitely as long 
as the gate of the lower ra-fet is not grounded. 

If the edge is vetoed, however, i.e., if the NOR output is high, the ra-fet gate will be 
connected to ground when CG goes high. If this occurs, the upper right p-fet will be 
turned on and will pull the drain of the ra-fet to Vdd-, shutting off the upper left p-fet. The 
discharged state of the input to the CMOS inverter formed by the righthand n-p transistor 
combination is also stable since the SET transistor is not turned on again for the remainder 
of the edge detection procedure and there is no other mechanism for recharging the latch. 

The edge signals for a given row are read out by bringing the 'Row Select' signal high 
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which connects the latch output to the bit line for its column. The bit line is in turn con- 
nected to an inverting digital output pad driver to bring the signal off-chip. The current 
for charging and discharging the pad driver is supplied by the CMOS inverter on the right- 
hand side of the latch. Since this current can be supplied without affecting the state of the 
inverter input, the edge signals can be read out nondestructively at any point during the 
edge detection procedure. 



Chapter 13 



Edge Detector Test Results 



Several fabrication runs through the Orbit CCD process were necessary to finalize 
and debug the design of the multi-scale veto edge detector. After several unsuccessful 
attempts to build a charge-based absolute value of difference circuit, similar to the one 
used by Keast [86], with the required resolution, the transistor-based design described in 
Section 12.2.1 was developed. Tinychips 1 containing test structures from the new circuit 
were sent out on 6-30-93, and based on the results from this run, a 32x32 array — which is 
the largest size that could fit on the maximum 7.9mm X 9.2mm die — along with a second 
full-size chip containing isolated test structures, were sent out for fabrication on 10-28-93. 

Using the test system described in the next section, which was designed based on the 
layout of the 32x32 array processor, it was possible to obtain a more accurate characteri- 
zation of the CCD performance than had been available from the much simpler test setup 
used with the Tinychips. Several previously unnoticed problems, such as charge trapping 
in the input shift register and the backspill problem discussed in Section 12.1.2, were thus 
discovered, and a final design, with these minor issues corrected, was sent out on 4-27-94. 
The die photographs of the fabricated 32x32 array and the full-size test structures chips are 
shown in Figures 13-1 and 13-2. The test results presented in this chapter were obtained 
from the chips returned from this latest run. 



A Tinychip is a low-cost 2.22mm x 2.25mm die size offered by MOSfS for test designs. 

206 



CHAPTER 13. EDGE DETECTOR TEST RESULTS 



207 



»n»k»hhikfc* t*»»*ftftfc«**ft»*P. P, P P • » • 




■MM 

SSSWW 

■ — 



Figure 13-1: Die photograph of 32x32 array. 
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Figure 13-2: Die photograph of the test structures chip. 
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13.1 Test System 

In order to test the prototype multi-scale veto chip it was necessary to supply a total of 
31 programmable control signals: 

• The six clock phases for driving the CCD array, (f>i-(f>Q, plus the floating-gate reset 
signal, Vf g , and the control signal, <j)Q for the backspill prevention circuit; 

• The edge detection control signals: i?i, i?2, -R3, -R4, CG, and SET; 

• The input and shift register clock phases: TG\, TG2, <^ s i-<^ S 4, and tfisGi an d the two 
control signals for switching the variable analog waveforms connected to the stop gate, 
SG, and the input diffusion, Vd', and 

• Eight signals for selecting the row of edge outputs to be read and for enabling and 
pre-charging the output drivers. 

It was also necessary to supply progammable analog waveforms for the input and reference 
voltages, Vi n and V re f, as well as for the edge threshold voltages, V T , and to read and digitize 
the analog voltages from the floating gate amplifiers representing the smoothed brightness 
values. Finally, a wide data path connected to a high-speed memory buffer was needed to 
store the edge outputs as they are read out. 

The test system designed to perform these functions is illustrated in Figure 13-3. In 
order to preserve maximum flexibility and to minimize programming time, the system was 
built around a DELL 486 personal computer housing three commercially- made boards for 
facilitating the interface to the device under test (DUT). The first of these boards, manu- 
factured by DATEL, Inc., is a 4-output power supply for driving the MSV processor and 
its supporting circuitry. The second, also manufactured by DATEL, Inc., is an I/O board 
capable of digitizing 16 independent analog inputs to 12 bits with a conversion time of 12/is 
per input. It can also supply 4 independent analog outputs from 12-bit digital data stored 
in 4 on-board registers with a settling time of 5/US to 0.05% of full scale range. The third 
board, made by White Mountain DSP, Inc., is a TI TMS320C40-based evaluation board, 
originally designed for testing applications using the 'C40 microprocessor before building a 
stand-alone system. In the present test system, this board turned out to be very useful due 
to its 32-bit bus interface and access to the 'C40 read/ write control signals via the 96-pin 
on-board Eurocard connector. It also has a high-speed internal path to an on-board 4K 
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Figure 13-3: Test system design 
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dual-port SRAM which can be accessed by the PC as shared memory. All three boards 
communicate with the DELL over its PC/AT bus and can be programmed in a high-level 
language. In addition, programs for the 'C40 evaluation board can be loaded and executed 
independently of the DELL's 486 microprocessor so that separate programs can be executed 
asynchronously to drive the clock waveforms for the test system and to send and acquire 
data over the I/O board. 

The custom-designed portions of the test system include a pair of boards outside the 
PC which contain the device under test, circuitry for generating and switching the constant 
analog bias voltages used in the processor, and the clock driver circuits. In addition, a 
daughter board, which mates with the 'C40 evaluation board via the 96-pin connector, was 
designed to manage the flow of data across the 32-bit bus. The daughter board contains 
a bank of registers where the values for the 31 control signals for the MSV processor are 
latched and a set of tri-state bus drivers used to transmit the edge outputs over the bus 
to the internal SRAM. One 32-bit path connects a set of tri-state buffers on the device 
board to the bus drivers on the daughter board. Since the prototype processor contains a 
32x32 array, there are 63 edge outputs per row — 32 vertical and 31 horizontal — which are 
output simultaneously. The external buffers are used to select one-half of the edge signals 
to transmit to shared memory during one read cycle. 

A second 32-bit path connects the 31 register outputs and the system ground on the 
daughter board to the clock drivers that generate the control signal waveforms for the 
device under test. External drivers are necessary as the register outputs are neither clean 
enough nor strong enough to drive the CCD gates and the edge detection circuits directly. 
For proper operation, it is important to control the rise- and fall-times of the waveforms 
and to minimize ringing. The drivers must also supply enough current to bring the highly 
capacitive loads formed by the CCD gates to their final levels within the time allowed. 

The clock drivers were built based on a standard circuit, shown in Figure 13-4, used in 
some CCD cameras and described in reference [100]. This circuit uses a National Semicon- 
ductor DS0026 chip containing two drivers which, when supplied with TTL digital inputs, 
will produce an inverted output capable of driving a lOOOpF load at 5MHz. A 2KS7 po- 
tentiometer is used to adjust the rise- and fall-times of the clock signals to approximately 
100ns, and a diode protection circuit prevents the clock waveform from ringing below the 
MSV chip substrate voltage, Vss- This protection circuit is crucial to prevent the on-chip 
input protection diodes from turning on and causing charge injection into the substrate 
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Figure 13-4: Clock driver circuit used in the test system 

which can then be collected in the CCD array. 

One disadvantage of the test system as it is designed is that, despite the 40ns instruction 
cycle time of the 'C40, the minimum control signal pulse width is limited to 800ns both 
by the programming overhead and by the propagation delays through the system. It was 
thus impossible to test how much faster than 625KHz that the MSV processor could be 
operated. Convenience and flexibility in debugging the design of the processor itself were 
determined to be more important, however, than optimizing the test system for speed. 



13.2 Driving Signals Off-Chip 

Two types of output pad drivers, one digital and one analog, were used to drive signals 
from the processor off-chip. The digital pad drivers, which are used with the edge signal 
bit lines, are simply large CMOS inverters designed to drive a 20pF load from 0V to 4V, or 
from 5V to IV, in less than 10ns for a ±5V step input. The input capacitance of the digital 
driver, as seen from the edge storage latches in the array, is approximately 2.5pF, of which 
only .5pF are due to the gate capacitance of the inverter and the other 2pF are due to the 
capacitance of the bit line itself. Simulation results indicate that the storage latch can drive 
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Figure 13-5: Analog output pad driver circuit 



the inverter input from 5V to < IV in 35ns, and from 0V to > 4V in 55ns. The rise and fall 
times of the edge output signals on the test chips, measured using an oscilloscope from the 
time that the 'Row Select' signal was brought high, were confirmed to be approximately 
the same as predicted by the simulation. Given the 800ns control signal pulse widths used 
in the test system, the edge outputs had ample time to become stable before being read 
into the 'C40 SRAM. 

The analog pad drivers were designed to buffer the output voltages from the different 
isolated structures laid out on the test chip, as well as from the floating gate amplifiers on 
the east boundary of the array which were used to output the smoothed image data. Since 
the original voltages could only be recovered by correcting the measured signals for the 
pad driver response, it was very important to achieve good matching between the different 
drivers in order to accurately measure the responses of the on-chip structures. 

The analog pad driver circuit, which is shown in Figure 13-5, consists of a pair of 
cascaded n- and p-type source followers. To achieve good matching, the transistors were 
drawn with large W/L ratios to minimize the percentage effects of process variations. Since 
the on-chip structures generally pull very little current, it was important to minimize the 
additional load capacitance due to the drivers. The total gate capacitance of the first stage 
is approximately 230fF, which is almost 7 times less than the 1.5pF capacitance of the 
second stage that drives the output load. The metal2 lines leading to the pad drivers, 
which are much shorter than the edge signal bit lines that run across the chip, have an 
average capacitance of approximately 120fF, giving a total load of about 350fF. 
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Figure 13-6: Analog output pad driver characteristic - simulated vs. actual (Vi ow = 1.2V, 
Vhigh = 3.8V). 



The input stage was chosen as an ra-type device in order to sense the full range of 
output voltages from the various on-chip structures, which are between 1.5V and 5V. Since 
the voltage levels are shifted down by the first stage, the second stage could be built with 
a higher gain p-type device laid out with separate wells for each transistor to eliminate the 
backgate effect. The total power dissipation of each driver is approximately 390/uW and is 
the highest of any structure on the chip. Separate Vdd an d GND rails were thus drawn 
so that the relatively large currents pulled by the drivers would not affect the circuits 
on the rest of the chip and also so that the actual power dissipation could be measured 
independently. 

Figure 13-6 shows the results of test measurements from 24 different pad drivers on 
12 different chips compared with the performance predicted by simulation. The solid line 
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represents the average output from all 24 drivers. The standard deviation of the individual 
responses from the average curve is 8mV with a maximum absolute variation of 17.6mV. 
The average measured gain is 0.877, which is slightly better than the predicted value of 

0.848. 

13.3 CCD Tests 

Correct operation of the MSV processor depends critically on the device characteristics 
of the CCD structures used in the array. The parameters which most affect signal processing 
ability are the magnitudes of the channel potentials, the amount of dark current, and the 
charge transfer efficiency. Isolated test structures were laid out to measure each of these 
parameters, as well as to measure the the input/output characteristics of the fill-and-spill 
and floating-gate amplifier structures used to interface with the array. 

13.3.1 Channel potential measurements 

To measure the maximum potential, (j) max , in the buried channel as a function of the 
applied gate voltage, a separate structure was laid out on the test chip containing two CCD 
'transistors' formed by placing n-\- diffusions on either side of an isolated polysilicon gate 
covering a segment of buried channel. Two such transistors were needed for each of the 
two polysilicon layers. The four diffusions were connected via metal contacts to separate 
unprotected and unbuffered pins so that their voltages could be measured directly. 

To measure the potentials, the 'source-follower' method, described by Taylor and Tasch 
in [101], was used, as this approach was the simplest to apply given the test system setup. 
In this method, the substrate is grounded and a voltage is applied to the gate of each CCD 
transistor. The diffusions on one side of the transistors, acting as the drains, are set to 
a high voltage, while the source diffusions are connected to ground via a high impedance 
load. In this situation, the transistors are at the threshold of conduction so that the source 
voltage is equal to (j) max - Due to the potential differences across the metal contacts and the 
n-n-\- and p-p-\- junctions, the measured voltage is Vs~ Vn, where Vn is the built-in voltage 
across the n-p junction at the channel-substrate interface. The channel potentials are thus 
recovered by adding 0.8V, which is the value given by Orbit for Vn, to the measured values. 

The (f> max -Vg curves obtained for the chips from the 4-27-94 and the 10-28-93 runs are 
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CCD channel potentials (4-27-94 run) 
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Figure 13-7: CCD channel potentials (4-27-94 run) 
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Figure 13-8: CCD channel potentials (10-28-93 run) 
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plotted in Figures 13-7 and 13-8. A process change was made by Orbit for the 4-27-94 run 
which had the effect of lowering the potentials by approximately 2 V from those of the earlier 
run. This decrease does not greatly affect the results of the calculations in Sections 12.1.1 
and 12.1.3 used to determine the sizes of the input and floating-gate structures, as these 
depend mostly on the differences in the channel potentials at different gate voltages rather 
than on their actual value. It does, however, affect the charge- carrying capacity of the CCD 
gates, and therefore the signal-to-noise ratio of the devices. 

As can be seen from the diagrams, there is little change in the slopes of the (f> max -Vg 
curves between the two runs. The less than unity slope of .885 for polyl and .877 for poly2 
is due to the capacitive divider between C e // and C'd2 explained in Sections 11.2 and 11.6.2. 
Since 

^4>max 1/Cd2 



AV g 1/C d2 + 1/C eff 
we have (for polyl) 



13.1 



C 

d2 -i .13 (13.2) 



which is less than half the value calculated from the estimated design parameters given in 
Table 12.2. 

13.3.2 Input and output 

The fill- and- spill and floating-gate amplifier structures were tested jointly by combining 
both devices in a single 1-D gate array. A separate test structure containing only the source- 
follower buffer used in the floating-gate amplifier was also laid out on the test chip so that 
it could be characterized independently. 

The measured source-follower response from one chip is plotted in Figure 13-9, with 
the dashed line indicating the measured voltages, and the solid line representing the data 
corrected for the analog pad driver response. The average gain measured from twelve devices 
on twelve different chips was 0.895, which is considerably higher than the value of 0.800 
predicted by the simulation shown in Figure 12-5. This discrepancy is due to the smaller 
actual values of the process parameters 7 = yj2q^siN~A / C ox and A, the channel length 
modulation parameter, both of which determine the magnitude of g m b, from those used in 
the simulation. The values given for the 4-27-94 run were 7 = 0.4493 and A = 0.0304, as 
opposed to the values from the 10-28-93 run given as 7 = 0.4977 and A = 0.0318. Matching 
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Floating Gate Source Follower Response Function 




0.5_ L 



solid -- Corrected for pad drive 
dashed -- Raw data 



0.5 1 1.5 



2 2.5 3 3.5 4 4.5 5 

Vin 



Figure 13-9: Measured source follower characteristic for floating gate amplifier (V&; as 
IV). 



between the different source-followers was relatively good, with an overall standard deviation 
of 14.5mV from the average output values, and a maximum deviation of 37mV. 

The results of the combined fill- and- spill and floating-gate amplifier structures from 
twelve different chips are shown by the individual points in Figure 13-10, with the solid 
line representing the average output. The values are plotted against the difference in the 
voltages applied to the input and reference gates. Since these gates are in different levels 
of poly (see Figure 12-13 for the layout of the fill- and- spill structure), the plot begins for 
Vi n — Vref = —0.3V, which is approximately the magnitude of the potential mismatch 
between the two levels. 

The standard deviation of the different outputs from the average is 31mV with a max- 
imum deviation of 121mV. It should be noted that the differences in the outputs are a 
combination of the mismatches between the source-follower buffers and the variations in 



CHAPTER 13. EDGE DETECTOR TEST RESULTS 



219 



Floating Gate Output Characteristic 




Vin-Vref 
Figure 13-10: Measured floating gate amplifier output vs. fill- and- spill input, Vi n — V re f. 



the channel potentials of the fill-and-spill input devices. Since there is only one input struc- 
ture for each processor, differences in the floating-gate amplifier responses within the same 
array should be due only to mismatches in the source-followers. 

The total output swing of the floating-gate amplifiers is 2V, from a maximum of 2.8V 
for Vi n — Vref = —0.3V to the minimum value of 0.8V at Vi n — V re f = 1.4V. The average 
slope of the response curve over this range is -1.48. Correcting for the source-follower gain 
of 0.895 thus gives the change in the floating gate voltage per unit change in the applied 

signal voltage as 

AV„ 

t -1.65 (13.3) 



AV« 



sig 



where V s i g = Vi n — V re f + S P1 - P2 , with fi pl - P2 representing the polyl-poly2 mismatch. 
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From equations (11.28) and (11.53), we have 

AV g = Qsw ( 1 / Cd2 ) (13.4) 

and 

Qsig = —CeffpVsig (13.5) 

where the distinction has been made between the effective capacitances of the polyl and 
poly2 structures. Combining these equations gives 

AT^ = _Ceff^ t 1/C d2 \ 

A-Vsig Cl ad \1/Cd2 + 1/C e ff pi + 1/CloadJ 

Using the value of Cd2/C e ff = .13 computed from the channel potential measure- 
ments, equations (13.3) and (13.6) are consistent with 

-^^2 (13.7) 

Cload 



have C e ff = 1.8 X 10 16 F/ jim 2 . With S P1 - P2 = 0.3V and 1.4V being the maximum value 



If we take the value of 9.2 X 10 17 F/ fim 2 computed for Ci oa d in equation (12.5), we thus 

™ C 'eff P2 = 1-8 X 1( 
oiVi n - V re f, we find 

N e}Tnax = - ^ sl <J> max = 1912 electrons I lira 1 (13.8) 



^C sig,max 



The input gate area being 900/um 2 , the maximum number of signal electrons is approxi- 
mately 1.7 X 10 6 . 

13.3.3 Dark current measurements 

In order to use CCDs for signal processing, all computations must be completed in less 
time than it takes for dark current to appreciably affect the signals. Dark current levels for 
the test chips were measured using the floating-gate amplifier structure by initializing the 
node gate to its high voltage and then allowing it to float without introducing charge via 
the gate array. Since dark current is the only source of charge into the potential well under 
the floating gate, the magnitude of the current can be estimated by measuring the change 
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Dark current accumulation measurement 
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Figure 13-11: Dark current accumulation for cooled and uncooled chips. 

in the gate voltage over time. 

Figure 13-11 shows the results of two tests, one with the chip at room temperature and 
the other with the chip cooled to approximately 0°C. Fhe output voltages were sampled at 
125/US intervals for 0.25 seconds. Fhe values plotted are the computed gate voltages after 
correcting for the responses of both the source-follower on the floating gate amplifier and 
the output pad driver. From equation (13.4), the slopes of the curves are related to the 
dark current density, .Id by 



AV n 



Jd 



l/C, 



(12 



At C'load \1/Cd2 + l/Ce// pi + l/C/oad / 



;i3.9) 
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Using the previously estimated values for C\ oa d and C'd2/C e ff , we thus have 

J D = —^ X 1.1 X 10" 8 A/cm 2 (13.10) 

For the chips at room temperature with a slope of AV g /At = — 1.74V/sec, .Id is approx- 
imately — l9nA/cm 2 , while for the cooled chips, with AV g /At = — 0.38V/sec, Jp is only 
— 4.2nA/cm 2 . It should be noted that the uncooled dark current levels fluctuate tremen- 
dously as both room and chip operating temperatures vary. The uncooled values, however, 
were never measured at less than -1.4V/sec and were often as high as -2.5V/sec. These 
values are high with respect to commercial grade CCDs, but are not unusual for the Orbit 
process which is not optimized for dark current 2 . For tests involving total processing times 
of more than 5ms, it was thus necessary to cool the chips to achieve proper operation. 

13.3.4 Transfer efficiency 

In order to measure charge transfer efficiency, a special structure composed of a chain 
of 27 1-D unit cells together with a boundary cell containing a nil- and- spill input device 
was laid out on the test chip. The 1-D cells are simply truncated and rearranged versions 
of 2-D unit cells which are missing the vertical sections. The layout of these cells is such 
that, when placed end-to-end, a linear array is formed which is identical in operation to the 
2-D array, with the exception that there is no vertical movement of charge. Since each cell 
contains 12 gates, the signal charge in the test array is transferred through 336 gates from 
the end of the fill- and- spill structure to the final node gate, whose floating-gate amplifier 
output is connected to one of the analog pad drivers. 

As explained in Section 11.4, charge transfer efficiency can be measured from the ratio 
of the size of the charge packet under the iVth transfer gate to the original packet size at 
gate 0. From equation (11.31), we find that 

^L = e N (13.11) 

To measure transfer efficiency with the given test structure, the array was initially flushed 
by executing the horizontal transfer sequence many times (> 100) with no input charge 
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Figure 13-12: Charge transfer efficiency measurement. 



introduced through the fill- and- spill device. The control voltages, V re f and Vi n , were then 
set to give a maximum size charge packet, and the fill- and- spill and transfer sequences were 
executed repeatedly to move a series of equal size packets into the array. By measuring the 
floating-gate amplifier output of the final node at the end of each transfer stage, the size 
of the first charge packet in the series which arrives at the node can be compared with the 
size of the following packets. After several transfer sequences, steady-state conditions will 
be reached — in which as much charge is lost to the trailing packets as is picked up from the 
previous ones — and the size of the packets will be approximately that of the original signal. 
Figure 13-12 shows the results of one test which is typical of the behavior observed 
in all of the chips. Approximating the floating-gate response as being linear with AQ, 
the ratio of Qn/Qo is given by the ratio of the initial voltage drop from the zero-signal 
level at stage 28, when the first packet arrives, to the final voltage drop measured several 
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stages later. As seen in the plot, this value is computed as AVi/AVf = .17, and from 
equation (13.11) with N = 336, we find the per-gate transfer efficiency to be e = 0.995. 
Using N = 28 in equation (13.11), we can also compute e stage , the transfer efficiency per 
stage as e stage = 0.939. 

The measured transfer efficiencies for the Orbit CCD process are very low compared with 
desired values of 0.99999 or better required for large gate arrays. As seen in Section 13.5, 
the low CTE of these devices does affect the results of the 32x32 prototype array and limits 
our ability to characterize very large structures. 

13.4 Differencing and Threshold Test Circuits 

Three isolated structures were laid out to test the operation of the edge processing 
circuits contained in the unit cells. The two major components, the differential amplifier 
and the voltage comparator, were set up to be tested individually, while a third structure 
contained the complete absolute- value-of-difference and threshold test circuit. 

13.4.1 Differential amplifier 

Differential amplifier responses were measured for twelve different test structures under 
the same conditions (Vbi as = IV, and common mode values of V c = 1.2V, 2.0V, and 2.8V) 
used in the simulation described in Section 12.2.1. The results from one of the test circuits 
are shown in Figure 13-13. Over all, the basic shape of the response curve is very similar to 
that predicted by the simulation, with good common mode rejection and differential gain, 
Ad, of 7.2. However, for a given input difference, there are significant variations in the 
output voltages of the twelve circuits due to variations both in the amplifier offset voltages 
and in the values of the common mode outputs, V oc . 

The average value of V oc among the twelve circuits was 3.43V, with a standard deviation 
of 38mV. Referring to Figure 12-16, the common mode output voltage is determined by the 
amount of drain current flowing through each side of the second-stage amplifier when it is 
in its balanced state and by the effective resistance of the diode-connected p-fet load. Given 
that the drain current in the balanced state is one-half of the current through the lower 
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Differential Amplifier response function (corrected for pad drive) 
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Figure 13-13: Measured differential amplifier characteristic for common mode voltages 
V c = 1.2V, 2.0V, and 2.8V with V bias = IV. 



bias transistor, it can easily be seen, using the notation of Section 12.2.1, that 



V oc = VdD + \V t p\ - (Vbias - Vt n )\ 



l K n (W/L) bms 
2K P (W/L) 2 



13.12) 



where V in and Vt p are the n- and p-fet threshold voltages. Since the quantity under the 
radical is much larger than one, variations in V oc thus depend strongly on variations in the 
threshold voltages of the bias transistors. 

The offset voltage of a differential pair, defined as the difference in the inputs V\ — V 2 
required to make the output voltages equal, is, from [98], 



Vos = AV t 



Id 
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Combined Maximum Differential Amplifier Outputs (corrected for pad drive) 
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Figure 13-14: Combined maximum differential amplifier outputs vs. | Vi — V2 1 f° r common 
mode voltages V c = 1.2V, 2.0V, and 2.8V with V bias = IV. 



where AVj is the difference in threshold voltages of the two input transistors, Id is the aver- 
age drain current through each side, (W/L) is the average size ratio of the input transistors, 
and g m is the average transconductance of the two p-fet loads. The quantities A(l/g m ) 
and A(W/L) represent the differences in these parameters between the two sides of the 
differential pair. 

The average offset voltage of the twelve circuits was +4.5mV, with a standard deviation 
of 3.3mV. Given the very low bias current (~ 400nA with Vu as = IV) and the large W/L 
ratios used in the transistors, the variations in both Vos an d V oc can be attributed almost 
entirely to differences in the threshold voltages. 

The effect of these variations on the overall operation of the MSV processor can be 
judged from the plot, shown in Figure 13-14, of the maximum amplifier outputs for all 
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Sense Amplifier switching characteristic 
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Figure 13-15: Measured sense amplifier switching characteristic. 

twelve circuits at each of the three common mode values against the absolute input difference 
l^i — Vi\- This plot can be considered as a measurement of the resolution of the differential 
amplifier and gives a lower bound on the resolution of the complete edge detection circuit. 
For |V~i — V2I < .IV, the horizontal distance between the dashed lines, which approximates 
the smallest measurable difference in the inputs, is roughly 28mV. For absolute differences 
above .IV, however, the amplifier saturates rapidly so that it is impossible to distinguish 
between differences that are greater than .14V. When considered as a percentage of the full 
scale input range, if this is set to 1.4V, the 28mV to 140mV range of measurable differences 
for the actual circuits is in fact better than the targeted swing of 2.5% to 10% of FSR. 



13.4.2 Sense amplifier voltage comparator 

The sense amplifier switching characteristic was measured by setting one input, Vda-, to 
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a fixed value and varying the other input, V T until finding the point at which the comparator 
output changed. Since, as explained in Section 12.2.2, the output is random when the two 
inputs are closer than 8/2, where 8 is the resolution of the sense amplifier, the actual 
measurement used to determine the switching point was the number of times out of 100 
that the comparator output was high. Two values, V T j 0W and V Tt hi g h, were thus measured 
for each value of Vda-, with the first being the highest value for which the output was 
high < 1/100 times, and the second being the lowest value for which the output was high 
> 99/100 times. 

The cumulative results from twelve different comparator circuits are plotted in Figure 13- 
15, with the dashed lines representing the envelope of the high and low threshold voltages 
from all twelve circuits. The maximum horizontal distance between these lines, which is a 
measure of the sense amplifier resolution, is 9.8mV. Given that this value is divided by the 
differential gain, A^, when the input is referred back to the that of the differential amplifier, 
the lOmV resolution of the comparator should have a negligible effect on the overall edge 
detector circuit performance. 

It should be noted from the plot that there is a slight offset of approximately 15mV 
between the threshold and input voltages when the comparator switches. This offset is 
caused by the capacitive imbalance created by connecting an inverter input to the V T side. 
Since the offset is constant, however, it can be compensated for in setting the value of the 
threshold voltage. 

13.4.3 Combined edge veto test circuit 

The complete multi-scale veto edge detection circuit, starting from the inputs to the 
source-follower buffers of the floating-gate amplifiers and ending with the output of the 
edge storage latch, was laid out as an individual test structure to evaluate the combined 
performance of all of its elements. The circuit was tested by providing a fixed voltage 
difference to the source-follower inputs and determining the maximum value of V T for which 
the input difference could be considered as an edge. 

Figures 13-16 and 13-17 show the results from one circuit, plotted against both V\ — V2 
and \V\ — V2I, for input common mode values of 4.2V, 3.3V, and 2.5V, corresponding 
roughly to the high, medium, and low floating-gate voltages. The common mode variation 
in the results reflects that of the differential amplifier, while the offset in the circuit is the 
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Figure 13-16: Measured response of one absolute-value-of-difference circuit. 
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Figure 13-17: Measured response of one absolute-value-of-difference circuit, plotted 
against \V\ — T^l- 
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combination of both the differential amplifier offset and the mismatch in the source-follower 
buffers. 

It should be noted that the curves flatten much more abruptly for absolute input differ- 
ences greater than IfOmV than those of the differential amplifier by itself. Closer analysis 
of the combined circuit reveals that the cause of this abrupt flattening is the diminishingly 
small current supplied to the sense amplifier by the high differential amplifier output side. 
As the output approaches the saturation level, it becomes unable to charge the sense ampli- 
fier input gates within the alotted precharge period. Increasing the precharge time is not an 
effective solution to widening the range of the absolute- value-of-difference circuit, however, 
as the time needed rises very rapidly as the current goes to zero. The preferred method for 
increasing the range is to raise the rail voltage, Vdd, on the differential amplifier, thereby 
increasing its saturation voltage. Raising Vdd w iU a l so increase power dissipation during 
the edge detection cycles. However, as the time spent in edge detection is much less than 
that required to load the image, the net increase in average power should be negligible. 
Unfortunately, the test system was not designed to allow separate power supply voltages 
for the edge detection circuits and the CCD clock drivers, and hence it was not possible to 
implement this option in present setup. 

The composite results from twelve different absolute- value-of-difference (AVD) circuits 
for common mode input voltages of 4.2V, 3.3V, and 2.5V, with Vdd = 5V, are plotted 
as individual points in Figure 13-18. The horizontal distance between the dashed lines 
bounding the results indicates the overall resolution of the edge detection circuits for the 
array processor. For values of \V\ — Vi\ < 90mV, this distance is approximately 65mV, 
while for differences greater than fOOmV, the distance becomes infinite. The value of 65mV 
would be correct for the lower bound of distinguishable differences of 2.5% FSR, if the input 
range is 2.6V. The upper limit, however, is clearly inadequate for the 10% of FSR which 
was desired. 

13.5 Operation of the Full Array Processors 

Two array sizes were built to test the operation of the complete MSV processor. A 
32x32 array — the largest which would fit on the maximum available die size — was laid out 
as a separate chip, while a smaller 4x4 array was included on the test structures chip. 
Given the poor charge transfer efficiency measured for the CCDs and the limited resolution 
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Figure 13-18: Composite response of twelve AVD circuits. 



of the AVD circuit, it was clear that it would not be possible to test very precisely the 
processor's ability to discriminate between step edges, lines, and impulse noise as described 
in Chapter 5. The low CTE in effect results in a pre-smoothing operation as the image 
is loaded, while the limited AVD resolution restricts the number of smoothing cycles for 
which interesting results can be obtained. Nonetheless, it was possible to test several general 
characteristics of the array processors and verify that their overall operation was as planned. 
The first test performed on the 4x4 and 32x32 arrays was to compare the I/O charac- 
teristics of the different rows. This was done by loading an entire column with the same 
value of Vi n — V re f and measuring the floating-gate amplifier outputs of each row as the 
column was shifted to the end of the array. The results are shown for all rows of one 4x4 
array in Figure 13-19 and for rows 3 through 16 of one 32x32 array in Figure 13-20. The 
curve for the 4th row of the smaller array is seen to be significantly shifted above those for 
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Figure 13-19: Floating-gate outputs of 4x4 array processor. 
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Figure 13-20: Floating-gate outputs of 32x32 array processor. 
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a.) smoothing cycles. 



4x4 Array Output Values 
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b.) 1 smoothing cycle. 
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c.) 2 smoothing cycles. 
Figure 13-21: Smoothing of one- pixel impulse on 4x4 array. 



the other three rows due to charge loss in the input shift register. Since the last row is the 
first to be input, it receives the smallest input charge packet while the preceding rows re- 
ceive successively larger packets until steady-state conditions are reached (see Figure 13-12 
for reference). The full effect of charge loss in the shift register is thus observed in the plot 
of the 4x4 array output, while the output curves for the 14 rows at the top of the 32x32 
array are indicative of the steady-state results. The chips were frozen prior to testing so 
that dark current would not be a significant factor. 

The small size of the 4x4 array was nonetheless convenient for testing the smoothing 
and edge veto functions. A test input was provided to the array consisting of a single 



pixel impulse with V s i g = Vi n — V re f + 8, 



P1-P2 



1.199F at the 2nd row and 3rd column. 
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c.) smoothing cycle 2, Ti = 3.848 
Figure 13-22: Edge detection results for 4x4 array with impulse input. 
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Figure 13-21a shows the input data along with the floating-gate amplifier output of each 
row with no smoothing. Again, due to the poor charge transfer efficiency, the impulse is 
spread along the 2nd row over the first and second columns as well the third. Referring 
to the I/O transfer function curves plotted in Figure 13-19, the output value of 2.075V 
measured at the original location of the impulse corresponds to an input of V s i g ~ 0.985F, 
which is very close to the value of 0.943V predicted using a per-gate CTE of 0.995 given 
that the charge is transferred through 48 gates before reaching the output device. 

The array outputs after one and two smoothing cycles are shown in Figures I3-2Ib and 
I3-2Ic next to the values predicted by applying the binomial kernels of equations (12.8) 
and (12.9) directly to the unsmoothed outputs given in Figure I3-2Ia. Comparing the 
results, it can be seen that the actual and predicted values are within I0-20mV of each 
other and indicate that the smoothing operation does in fact closely approximate a 2-D 
binomial convolution. It should be noted that this method for generating the theoretical 
values is acceptable as long as the outputs lie within the (approximately) linear range of 
the floating-gate amplifiers. Given the amount of variability in the I/O transfer functions 
of each row, this method is also preferable to that of translating the output data to their 
equivalent inputs and then performing the convolutions. 

The edge detection/veto results for the same input pattern are shown in Figures 13- 
22a — 13-22c. The threshold values of To = 3.872, T\ = 3.860, and Ti = 3.848 were chosen 
from Figure 13-17 according to the expected differences in the pixel values at each smoothing 
cycle. Edges are indicated in the diagrams by the horizontal and vertical lines between the 
'X's that mark the pixels in the array. 

With no smoothing, edges are found between every vertical pair of pixels in the first 
column and between the first and second rows, between the vertical pairs of columns 1-3 
in the second and third rows, and between each horizontal pair of the second row. Some of 
these edges are clearly in error. It turns out that the edges shown between the vertical pairs 
of the first column are meaningless since they are caused by a bad connection at pin 9 of 
the 'C40 bus interface which receives these signals (see the die photograph of the test chip 
in Figure 13-2). Unfortunately, this problem, which could be repaired only by rebuilding 
the 'C40 daughter board, was not discovered until relatively late in the testing phase when 
there was not enough time left to remake the board. 

With the exception of the edge between the vertical pair at the top of the last column, 
which can be explained only by an extreme offset in the absolute- value-of-difference circuit 
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Figure 13-23: Test image used on 32x32 processor and corresponding edges. 



at that location, the other edges found appear to be plausible as they occur around the 
smeared impulse input. After one smoothing cycle the edge between the horizontal pair at 
the center of the second row is removed, and after two cycles, all of the edges except the 
vertical edges between the first and second rows and the meaningless edges of column 1 
are removed. The fact that the edges between the top two rows persist may be partially 
explained by the fact that the differences between rows 1 and 2 are not smoothed away as 
strongly as those between rows 2 and 3 due to the effect of the array boundary. Another 
more likely explanation, however, is the variation in edge threshold values between the 
different AVD circuits. 

Edge detection on the 32x32 processor was also tested by supplying a sample input 
image and recording the edge outputs. The input image used in one test, which is shown 
at the left of Figure 13-23, was sampled down to 32x32 pixels from an original 256x256 
image. Edges are displayed by coloring one of the adjacent image pixels in different shades 
of gray. An image pixel which has a vertical edge directly above it is colored light gray, 
while a pixel with a horizontal edge on its left side is colored dark gray. If edges exist both 
above and to the left of an image pixel, it is colored black. 

The results, shown on the right side of Figure 13-23, are not simple to interpret given 
that we do not know the actual signal levels stored in the array. With the 32x32 array, the 
floating-gate outputs of only the top 16 rows were brought to output pads as this was the 
maximum number of A/D channels available in the test system. Even if we did have the 
outputs from all 32 rows, however, we would still not have an accurate representation of the 
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internal signal levels since they would be further distorted as they were transferred to the 
output devices. Nonetheless, one can discern some general outlines, such as the edges found 
around the face area and near the shoulders and neck. The vertical black line towards the 
righthand side of the edge image, on the other hand, is due to the previously discussed bad 
connection at pin 9 of the 'C40 daughter board. 

13.6 Recommendations for Improving the Design 

Several problems were uncovered in testing the circuits used in the MSV processors 
which prevented all of the design goals from being met. Some of these problems had a 
trivial solution, such as redesigning the test system to provide a separate rail voltage to the 
differential amplifiers, while others, such as the poor charge transfer efficiency of the CCDs, 
could not be solved without changing the fabrication process. Nonetheless, it is clear from 
the overall results of the individual test circuits and the edge detection and smoothing tests 
on the arrays that given the proper resources, a processor can be built which does meet the 
design goals specified for the system. In this sense, the results of this research are positive. 

After thoroughly studying the advantages and limitations of the current design, however, 
several changes to the array architecture which would greatly improve its effectiveness in 
the motion estimation system are now apparent. One problem with the present design is 
that the unit cell, which measures 224/im X 224/ira, is too large. Even if scaled down by a 
factor of 4 to 56/2171 X 56/2171, one could at best build a 160x160 array on a 1cm die, while 
for reliable motion estimation, the minimum array size needed is closer to 256x256. 

The unit cell could be greatly reduced by removing most of the edge detection circuits 
and bringing them outside the array, as shown in Figure 13-24, where they can be shared 
by all cells in one row or column. The only circuits absolutely needed at each pixel are the 
floating-gate amplifier for sensing signal levels and the latches for storing the vertical and 
horizontal edge charges. The new unit cell structure, illustrated in Figure 13-25, could be as 
much as a factor of two smaller than the current design. Furthermore, bringing the absolute- 
value-of-difference circuits outside the array increases the flexibility for further improving 
their design for better matching and higher resolution, as the constraints on circuit area are 
no longer as severe. 

Changing from pixel-parallel to row- and column-parallel processing will of course in- 
crease the total time needed for edge detection. One advantage to the reduced structure, 
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Figure 13-24: Proposed architecture for focal-plane processor 
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however, is that it should be possible to use this array for imaging as well as for edge detec- 
tion since, unlike the present design, it does not require a large fraction of the total pixel 
area to be allocated to n- wells and diffusions held at Vdd which can both trap and sink 
light-generated charge. The current array structure, on the other hand, is not as well-suited 
for imaging; and if it were to be used in the motion estimation system, not only would a 
secondary imaging device be needed, but the time required to load images into the array 
would also have to be taken into consideration. The suggested design improvement would 
thus not significantly affect overall processing time, and would in fact reduce the complexity 
of the system by removing the need for the additional sensor. 

Making the suggested changes, it should be possible to build a 256x256 focal-plane MSV 
processor on a single chip using a 0.8/ira, or smaller, CCD-CMOS process. In Chapter 17, 
we will examine how the complete real-time motion estimation system could be assembled 
with this chip, along with the matching processors presented in Part III. 



Part III 

A Mixed Analog/Digital Edge 

Correlator 
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Chapter 14 



Design Specifications 



It is useful to recall the basic steps of the matching procedure, presented in Chapter 6, 
which are performed on each M X M block in the base edge map. The same notation is 
used as before where b denotes the 1-bit value of an individual pixel in the block being 
matched, and s denotes the value of an individual pixel in the search window. In addition, 
we let P = M 2 represent the total number of pixels in the block, and define Vij as the sum 
of absolute values of difference computed at position (i,j) in the search window. The steps 
of the matching procedure are summarized as follows: 

1. Count the number of edge pixels in the block from the base edge map to find 

l|5|| = £&- 

2. Compute ai||_B|| and test that 

Vi< ai \\B\\<V h (14.1) 

where 

V h = ai -, and, Vi = a 2 P (14.2) 

and ai,Q!2 are constants chosen to allow acceptable detection and false-alarm rates, 
satisfying 

1 > ai > 2a 2 > (14.3) 

as described in Section 6.1. 

3. If ai||_B|| is outside these bounds, stop. The block cannot produce an acceptable 
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match. Otherwise, for each position (i,j) of the search window: 

(a) Compute for each pixel in the block, the binary function: b~s + bs and sum these 
values over the entire block to find the score 

PV tJ =J2bs + bs (14.4) 

(b) If PVij < ai||-B|| 

i. Store the current value of (i,j) as a candidate match position, 
ii. If Vij < V m i n , the current minimum score, set V m i n = Vij and store the 
current value of (i,j) as the best position. 

4. After the score has been computed at every position of the search window, if at least 
one candidate match has been found, compute Ax and Ay, the maximum spread in the 
x and y coordinates of the candidates, and test if both Ax < d max and Ay < d max , 
where d max is the maximum possible spread for considering that the minimum is 
unique and well localized. If the results of both tests are true, signal that the match 
at the position of V m i n is acceptable. 

In order for the matching circuit to produce useful data for computing motion, three 
primary constraints must be satisfied. The first of these is that the block size, P, must 
be large enough to ensure that we can find constants a\ and a^, as defined in step 2 
above, to give adequate detection and false-alarm rates. The second constraint is that the 
search window must be large enough to account for the maximum displacements in the 
image plane caused by the motion. Finally, the circuit must be designed to compute the 
quantities ai||_B|| and PVij with sufficient precision to accurately perform the validation 
tests and find the minimum score. 

These constraints, which will be examined more closely in the following sections, deter- 
mine the minimum design specifications for the matching processor. Of course, the best 
design, given the requirements of the motion system, will be the one that not only meets 
these specifications, but also consumes the least power and silicon area. In the next two 
chapters, we will look at several different implementations of the matching procedure to 
find the one which is best according to all of these criteria. 
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14.1 Required Block Size 

We can find the minimum required block size from equations (6.16) and (6.17) for the 
mean and variance of Vij under the hypothesis, Ho, that the match is false. 

( ll-B||\ ||.B|L 
/^o,||B|| = M 1 - -^ J +^(1-P«) ( 14 - 5 ) 



and 

P 



a H ,\\B\\ = " ( 14 - 6 ) 



where p s is the probability that an individual pixel in the search window will be an edge 
pixel. Since ||P||/P < 1/2, due to the validation test (14.1), and p s > 0, we know that 

vh ,\\b\\ ^ ^r ( 14 - 7 ) 

Given that P = M 2 , we also have the following bound on the variance 

a Hn Hon < (14.8) 

with equality being achieved only for p s = 1/2. 

In order to give a low false-alarm rate, the threshold, r, for deciding to consider a match 
as a candidate is chosen such that 

a H ,\\B\\ <\r- fJ-H ,\\B\\\ ( 14 -9) 

Let n be the smallest number such that 

\ T ~ VH ,\\B\\\ > n( ?H ,\\B\\ (14.10) 

for all permissible values of ||-B||. From the validation test (14.1), we have 

11 " > — (14.11) 



P ~ a x 
while the cutoff threshold is given by r = ai||5||/P. Combining these equations with the 
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inequality (14.7) gives 

II-BII / 

\ T ~ ^H ,\\B\\\ > -p-(l-Ol) 

> — (1-ai) (14.12) 

We can thus ensure that the inequality (14.10) is satisfied for some suitably large value of 
n by choosing M such that, given a\ and a.^, 

^i(l_ ai )>JL (14.13) 

or, 

Tt 

M > ; (14.14) 

2a2/ai (1 — ol\) 

In using these bounds to determine M, we need to set a\ as large as possible and 
0.2/ oi\ as small as possible so that a sufficiently large number of blocks from the base edge 
map will pass the validation test (14.1). Otherwise, it may not be possible to find enough 
correspondence points to obtain good motion estimates. 

The values of a\ and o^/cii most used in testing the matching procedure on real images 
were 0.5 and 0.15, respectively. For the tests presented in Chapter 6, M was set equal to 24 
pixels, giving n > 3.6. As can be judged from the quality of the motion estimates listed in 
Chapter 7, this value was adequate for achieving a low error rate. Based on these results, 
we will thus require that the matching circuit be able to accomodate block sizes of at least 
24x24. 

14.2 Required Search Area 

In Section 2.3, image plane displacements were calculated, assuming a focal length 
of / = 200 pixels, for three special cases: pure translation along the x direction, pure 
translation along the z direction, and pure rotation of 6 about u> = y. At a frame rate of 
1/30 sec, the maximum absolute displacements were found to be 

1. For pure translation along x: 88.8/Z pixels, where Z is the distance in meters to the 
object being viewed. 
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2. For pure translation along z: (x, y)/300 pixels where (x, y) are image plane coordinates 
measured from the principal point, and 

3. For pure rotation of 9 about Q> = y: 3.5 pixels per degree of rotation, at the center of 
the image. 

These motions are quite typical of those that could be encountered in actual imaging 
situations. With / = 200 pixels, a sensor size of 256x256 pixels corresponds approximately 
to a field of view of 32.6°, as measured from the z axis, which is close to the largest that 
can be obtained from ordinary lenses without significant distortion. 

As can be seen from a few rough calculations, the largest displacements are caused by 
rotation about an axis parallel to the image plane. For example, with 5° of rotataion about 
y, the smallest offset is 17.5 pixels at the center of the image. Combined with a translation 
along x, the displacement could easily exceed 30 pixels if there are objects in the scene 
closer than 10 meters. Even if the motion is primarily a translation along the z axis, as 
would be the case for a camera mounted on the front of a car, any small rotation caused by 
vibrations or by turning can result in large offsets. 

We must thus plan for relatively large search windows of anywhere from 30x30 to 
200x200 pixels 1 . Of course, increasing the pixel size would reduce the magnitude of the 
displacements in number of pixels. However, it would also increase the actual image area 
covered by a single block since the required number of pixels per block would not change. 
As the area covered by the blocks becomes larger with respect to the total sensor area, not 
only can fewer blocks containing different features be extracted from the image, but the 
error in assigning the correspondence point to the center of the best matching block in the 
second image will increase. 

In addition to requiring large search windows, we should also require the size of the 
window to be adjustable according to the type of motion which is expected. For example, 
if the motion is primarily in the x direction, or if the y axis is the primary axis of rotation, 
as in the case of the turning car, the image plane displacements will mostly be in the x 
direction, and hence there is no point in wasting time searching over large y offsets. 

It should be noted that the search area requirements for the matching circuits to be used 
in the motion system are very different from those of the motion estimation chips typically 



The window sizes for the astronaut and lab sequences shown in Part I were 120x120 and 200x60, 
respectively. 
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used in video applications for camera stabilization or image sequence compression. In these 
applications, the camera is mostly stationary while the motion in the scene is caused by the 
people or objects being filmed. The differences in successive frames in these situations are 
usually small, and it is commonly assumed that maximum displacements are on the order 
of ±8 pixels. 

14.3 Precision 

There are three tests performed to validate a candidate match. The first two test the 
eligibility of the entire block by verifying that the total number of edge pixels is within the 
acceptable upper and lower bounds. Combining equations (14.1) and (14.2), we have 

P 

a 2 P < ai||5|| < ai— (14.15) 

The above expression is written in a manner to emphasize that the known quantities V\ = 
a 2 P and Vh = a.\P jl should be premultiplied and fed directly into the matching circuit, 
rather than be computed on-chip. Only the value of ai||_B|| needs to be computed by the 
circuit, and this can be done as the block is being read in. The precision required for 
representing ai||_B|| depends on whether the comparison is performed digitally or in analog. 
In an analog circuit, we only need to ensure that the difference Vh — V\ is large enough to 
discriminate a sufficient number of different levels in the value of a.\ ||-B||. In a digital circuit, 
if a.\ and a 2 are negative powers of 2, the operations required to perform the comparisons 
are trivial. If more precision is required, however, floating-point arithmetic must be used. 

The closest powers of 2 to the values of 0.5 and 0.15 used in simulating the matching 
procedure on the test image sequences are a\ = 1/2 and a 2 ja\ = 1/8. These numbers may 
be adequate for many images, however, the lower value for a 2 ja\ will increase the false- 
alarm rate. From equation (14.13) with M = 24, the values a\ = 0.5 and a 2 ja\ = 0.125 
give n > 3, as opposed to n > 3.6 with a 2 ja\ = 0.15. Furthermore, there is not much 
flexibility for tuning if a\ and a 2 are restricted to negative powers of 2. Not implementing 
the multiply and compare operations with some form of extended precision arithmetic will 
thus reduce the robustness of the system. 
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The third validation test which is performed is to compare 

PVJj < ai||.B|| (14.16) 

at each position (i,j) of the search window. The value of PVij can be anywhere from to 
P, although, since ai||_B|| < aiP/2, we only need to represent values up to a\P/2. With 
a 1 = 0.5 and P = 576, a\P/2 = 144, which requires 8 bits to represent digitally. Precision 
issues with the representation of PVij, as well as ai||_B||, are thus a concern primarily for 
analog implementations. It is certainly not necessary to require the circuit to discriminate 
a full 144 levels, however, we do need to ensure that the threshold test (14.16) is accurate to 
at least a fraction of au Q \\b\U an d we a l so need to ensure that the circuit can discriminate 
between different candidate minimum values of PVij. 

Any difference between scores that is within one standard deviation of the expected 
minimum value cannot be considered significant. From equation (6.14), the mean and 
variance of PVij under the hypothesis H\ that the match is correct is given by 

ElPV^Hr] = Pp n (14.17) 

Var(P^|P!) = P Pn (l- Pn ) (14.18) 

where p n is the probability of an edge pixel being turned on or off by noise. Suppose 
p n = 0.05, with M = 24 we have 



cT (PVij \ Hl) = y/YaiiPVijlHr) = M^ Pn (l- Pn ) = 5.23 (14.19) 

In order to have (J (pv l \H 1 ) ^ 1? it would also be necessary to have p n < .00174, which is 
much lower than can be reasonably expected. Being able to discriminate between scores 
that are within 3 or 4 votes of each other should thus be sufficient. From inequality (14.8), 
we also see that 

a Hn iibii < = — (14.20) 

and hence requiring that the circuit be able to discriminate more than 48 different values of 
ai||P|| will ensure that the threshold test (14.16) can be performed to an accuracy greater 
than oh ,\\b\\, 
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Case Study: A Purely Digital Design 



The matching procedure can be implemented by two very different architectures, both 
involving fully parallel array processing. In the first method, pixels from the block being 
matched are stored at the nodes of the array while the pixels from the search window are 
shifted across it. A score is computed at each shift cycle and compared with the current 
minimum value. If it is smaller, the minimum value is updated and the position of the 
search window is recorded. Once the entire search window has been processed, if all of the 
validation tests have been passed, the offset corresponding to the minimum score is reported 
as the position of the best match. 

In the second architecture, which has been used in some commercially available motion 
estimation chips 1 , the entire search window is stored in a processor array where each node 
corresponds to a given offset. Each pixel from the base image block is broadcast to the 
entire array so that its difference with every pixel of the search window can be computed 
simultaneously and the results added to the current scores stored at each node. The pixels 
of the search window are then shifted so that their offset relative to the next base pixel to 
be processed corresponds to the offset assigned to their new array location. Once all the 
pixels from the base image block have been processed, the validation tests are performed 
and the scores stored at each node are compared to find the one with the minimum value. 

In the next two sections, I will discuss how the matching circuit could be designed, given 
the constraints presented in the last chapter, using each of these architectures. 



For example the STI3220 motion estimation processor from SGS Thomson, which was briefly discussed 
in Chapter 3 
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15.1 First Method: Moving the Search Window 

The structure of the M X M processing array needed for the first architecture is shown 
in Figure 15-1 with the block diagram of the individual processing cells given in Figure 15-2. 
Each bit from the base image block is loaded and stored in a latch for the duration of the 
search. During the load phase, the edge pixels are counted and the value of ai||_B||, as 
defined in the preceding chapter, is computed and stored in a register so that it can be used 
for the threshold test performed on each score. The validation test comparing ai||_B|| to 
the external inputs Vh = a.\P jl and V\ = cn^P can be performed once the entire block from 
the base image is loaded to determine if it is necessary to start the the search procedure. 
Scoring begins as soon as the block corresponding to the first offset in the search window 
is moved into position by means of the shift register cells located at each processing node. 

The complexity of this design is not in computing the score at each pixel, which is a 
simple XOR operation, but in tallying the scores from every node in the array. Counting 
the scores must be performed as quickly as possible as there are many offsets in the search 
window and many blocks in the base image to match. Unless prevented by space restrictions 
on the chip, the tally function should thus be implemented as a single combinational logic 
circuit to avoid wasting cycles. Furthermore, even though we only need to represent numbers 
up to the value of ai||_B||, all of the votes must counted, as we cannot know in advance 
which nodes will have a 'high' output and which will not. 

Building a tally circuit to count votes from M 2 nodes is expensive both in area and in 
delay. Figure 15-3 shows the construction procedure for building an 2 n — 1 vote counter 
with an ra-bit output. A full adder can tally the three votes from Pq, Pi, and Pi giving the 
2-bit output {t\to}, where 

to = P0P1P2 + P0P1P2 + P0P1P2 + P0P1P2 (15.1) 

and 

h = P0P1 + P0P2 + P1P2 (15.2) 

A 7-vote tallier can be constructed by adding the results from two 3-vote talliers and 
connecting the seventh node to the carry-in input of the 2-bit adder. Generalizing this 
procedure, it is easily seen that a 2 n — 1 vote counter can be built from two 2 n ~ 1 — 1 talliers 
and one (n — l)-bit adder, as shown in Figure 15-3c. 
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Figure 15-1: Processing array with base image block held in fixed position. 
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Let A n denote the number of full adders required to implement a2"-l vote counter. 
Clearly, A n satisfies the recursive relation 

A n = 2A n _ 1 + n-l (15.3) 

and since A\ = 0, it is easily verified that the solution to this recursion is given by 

A n = T - n - 1 (15.4) 

The full 2 n — 1 input tally circuit can be visualized as a tree with n — 1 levels having 
(n — i — 1) 2 8 i-bit adders at each level. The total delay in the circuit is thus the sum of the 
worst case delays in the n—1 levels. Two choices with different worst case delays are possible 
for implementing the i-bit adders. The first is with ripple-carry, in which the carry bits 
propagate sequentially through the i full adders, while the other is with carry-lookahead , 
in which the carry bits are generated in parallel with combinational logic. If ripple-carry 
adders are used, the worst case delay for an i-bit adder is id, where d is the delay for a 
single full adder. The maximum delay for the full tally circuit would then be 

g u= *<lZL2) (15 .5) 

i=l L 

which increases as n 2 . If carry-lookahead adders are used, the delay can be reduced to 0(n), 
but at the cost of more complexity in the adder circuits [102]. 

Equation (15.4) is most useful when the number of votes to count is one less than a 
power of 2. In the last chapter, it was determined that the minimum block size should be 
24x24, meaning that 576 votes need to be counted with 10 output bits to represent the 
answer. Applying equation (15.4) with n = 10, we would need 1013 full adders to implement 
this circuit, which is clearly more than are actually necessary. The most efficient design, 
in terms of both area and interconnect requirements, is to build one 24- vote tallier per row 
and to sum the outputs from each row with an adder tree, as shown in the block diagram 
of Figure 15-1. We can build the row tallier with one 15-vote counter, requiring 11 adders; 
one 7-vote counter, requiring 4 adders; and one adder to count the remaining two nodes. 
One 3-bit adder and one 4-bit adder are then needed to combine the results for the entire 
row, for a total of 23 1-bit full adders. Summing the results from all 24 rows requires 12 
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5-bit adders, 6 6-bit adders, 3 7-bit adders, and 2 8-bit adders, giving a total of 133 1-bit 
adders. The entire circuit thus consists of 24 X 23 + 133 = 685 1-bit adders. 

One possible layout for a single node of the array is shown in Figure 15-4, with the 
equivalent circuit diagram in Figure 15-5. The top half of the layout contains the latch which 
stores the bit 6, the shift register where the bit from the search window is temporarily stored, 
and the XOR circuit which computes the score for the node. The bottom half contains a 
1-bit full adder and is laid out so that its width matches that of the top half of the cell as 
closely as possible so as to minimize wasted space. A single row of the processor array, along 
with its row tallier, can thus be formed by abutting 24 of the cells shown in the diagram, 
with one missing the adder section. To implement the tally function, an additional twelve 
horizontal lines of metal interconnect must be placed beneath the cell so that the inputs 
and outputs of the adder circuits can be properly wired. As shown in the diagram, the cell 
layout measures 340A(h) X 260A(w). With the additional metal lines, it will measure at 
least 412A(h) x 260A(w). 

The control lines carrying the clock signals for phasing the shift register and latching 
the bits from the base image block run vertically across the cell so that rows can be abutted 
to form the full array. The complete 24x24 array will thus measure at least 9888 A(h) X 
6240A(w). In addition to the area taken by the array, the summation circuit to add the 
results from all of the rows, will require a minimum of 133 times the area of a single full 
adder, which as shown, measures 134A(h) X 196A(w). 

It should be noted that the tally circuit just described uses ripple carry, and will thus 
have worst-case propagation delays proportional to the square of the number of levels in 
the summation tree, which in this case is five. The circuit can be made to run faster using 
carry-lookahead adders. However, these require more area, and because their structure is 
not as regular as that of the ripple carry adder, it would not be as simple to construct a 
unit cell for the array such as the one in Figure 15-4. 

15.2 Second Method: One Processor Per Offset 

The second architecture for implementing the matching circuit is interesting both be- 
cause it can operate much faster than the first method, and because it has been used in the 
design of some existing motion estimation chips currently on the market. 

The basic idea of this design is illustrated in the block diagram of Figure 15-6. The array 
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Figure 15-4: Layout of a unit cell including one full adder circuit. 
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Figure 15-5: Circuit diagram for the layout of Figure 15-4. 
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Figure 15-6: Offset processor array. 



consists of a reguiar arrangement of processors, each corresponding to a given offset in the 
search window. Each processor consists of an XOR circuit to compute the score for one 
pixel in the base image block, a counter/accumulator to compute the total score, and a shift 
register cell to temporarily store one pixel from the search window as shown in Figure 15-7. 
Fhe array is initialized by reseting all of the accumulators and loading the entire search 
window into the column store blocks placed between the columns of the processor array. In 
each processing cycle, one pixel from the base image block is broadcast on a global bus and 
its difference is computed simultaneously with every pixel from the search window. 

Fhe base image pixels are sequenced by column, such that pixels (0, i) through (M — 1, i) 
from column i are processed in order, followed by pixel (0, i + 1) from the top of the next 
column. At the beginning of each column, the contents of the column stores, which each 
hold one column from the search window, are copied into the vertical shift registers linking 
the processor nodes. After each cycle, the search window pixels are shifted up one row so 
that their offset relative to the incoming base pixel corresponds to the offset assigned to their 
new location. Once the last pixel in the column from the base image has been processed, the 
contents of the column store blocks are shifted horizontally, and the procedure is repeated 
until the entire block has been processed. Assuming the processor array contains W\ X Wi 
nodes, the minimum score and its offset can be found, once processing is completed, using 
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Figure 15-8: Layout of 8-bit counter and accumulator with inhibit on overflow. 
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a comparator tree with at most 2 n — 1 comparators, where n = [~rog 2 (WiW2)~|. 

This design is clearly much faster than the one discussed in the preceding section as 
the number of cycles needed to find the best offset is equal to the number of pixels in the 
base image block, instead of the much larger number of offsets within the search window. 
The price paid for this speed, however, is silicon area. As discussed in the last chapter, 
the accumulators at each node need to represent scores with 8 bits of precision, and the 
search window needs to be large enough to accomodate the typical displacements caused by 
the camera motion. It was estimated that the required size could be between 30x30 and 
200x200 pixels. 

A possible layout for an 8-bit counter and accumulator which could be used in the design 
is shown in Figure 15-8 and corresponds to the section of the block diagram in Figure 15-7 
below the XOR circuit. This cell measures 647A(h) X 387A(w) and includes a circuit to 
inhibit counting if the carry-out bit of the accumulator goes high, signalling an overflow. 
In this design, the 8-bit counter is implemented with two 4-bit counters, each containing 
four I-bit half-adders with carry-lookahead. Some area can be saved by removing the carry 
propagation circuits — which can be seen in the layout as the rightmost elements of the 
I-bit subcells — and connecting the carry-out bit directly to the input of the next cell. This 
would decrease the width of the cell by 86A, but will, of course, also reduce its operating 
speed. Based on the size of the counter/accumulator alone, ignoring the space needed by 
the column store and shift register subcells, we can see that a minimum size 30x30 array 
would require at least 19410A(h) X 11610A(w). Using a 0.8/um (A = 0.4//m) or smaller 
process, we could conceivably build one 30x30 processor array on a single 1cm die. 



Chapter 16 



Mixing Analog and Digital 



Of the two architectures discussed in the last chapter for a purely digital design, only the 
first one readily lends itself to analog processing. In the second architecture, in which the 
scores for each offset in the search window are computed simultaneously and accumulated as 
the base image block is read in, the principal processing element is the counter/accumulator. 
If designed with analog circuits, each node would need to include the following functions: 

1. Scaling, to convert the binary output of the XOR circuit to a usably small unit voltage 
or current, 

2. Addition of the new input to the previous value of the score, and 

3. Storage of the result. 

Since any mechanism to 'store' current also requires storing a control voltage, it is sim- 
pler to build an analog counter/accumulator which would operate entirely in the voltage 
domain. Adding voltages would require an opamp circuit with matched resistive elements, 
as well as low ouput impedance buffers to make the inputs appear as ideal voltage sources. 
Furthermore, care would need to be taken to ensure that the individual processors are 
matched to better than ±3% of their full scale range in order to meet the precision re- 
quirements outlined in Section 14.3. Even if it is possible to design the processors to these 
specifications, they will still require substantial silicon area, and will certainly be more 
expensive to fabricate than their digital equivalents. 

In the first architecture, on the other hand, it is much less difficult to implement the 
tally function with the required precision using simple analog circuits. Votes can be counted 
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Figure 16-1: Processing array with both analog and digital elements. 
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by switching on current sources at each node when the output of the XOR circuit goes high. 
The currents from each node can then be summed on a single wire and the result converted 
to a voltage so that the total score can be compared with, and possibly replace, the stored 
minimum score. 

The basic plan for a processor to implement the matching procedure using both analog 
and digital elements is outlined in the block diagram of Figure 16-1. This diagram is func- 
tionally identical to that of Figure 15-1. However, the row tally blocks and the summation 
circuit have been replaced by wires, and the comparison functions included in the blocks 
marked 'Min score & offset', 'Threshold test', and 'Validate' must now be implemented by 
analog circuits. 

In this chapter, I will discuss the design of the principal elements in Figure 16-1 which 
involve analog processing and analyze their performance as predicted by simulations, using 
the device parameters for the HP CMOS26 process, also offered through MOSIS. Layouts 
were generated for each of these elements in order to compare total area requirements with 
those of the corresponding digital implement aion. Simulation results based on a circuit 
extraction from the layout of a 5x5 array are given in the last section to illustrate the 
ability of the mixed analog and digital processor to find an artificial test pattern in a 9x9 
search window. 

16.1 Unit Cell Design 

The primary change to the unit cell is the addition of switched current sources to replace 
the full adder circuits used in the digital design. It was determined that better matching 
for the purposes of the threshold test could be achieved with less complexity if in fact two 
identical current sources were placed at each cell, one to be switched on when the value of 
s ® b is high, and the other when the value of b is high. The resulting circuit diagram and 
layout for the unit cell are shown in Figures 16-2 and 16-3. The left three-quarters of the 
layout containing the s-shift cell, 6-latch and XOR circuit are identical to the top half of 
the layout for the digital cell shown in Figure 15-4. The two current sources which occupy 
the right one-fourth of the cell add 80A to its width so that the total cell measures 161A(h) 
X 340A(w), as opposed to the 412A(h) X 260A(w) used in the digital design. Including the 
analog current sources thus reduces the cell area by almost a factor of two. Given that the 
adder tree needed to sum the results from all of the rows is also no longer necessary, the 
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Figure 16-3: Unit cell layout. 
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Figure 16-4: Transient behavior of switched current sources. 
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Figure 16-5: Output current vs. load voltage for unit cell current sources. 
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area required by the full array is only about one-third of that needed by the purely digital 
implement ation . 

Three considerations were important in determining the design of the current sources 
used in the unit cells. The first, of course, was to achieve good matching. The second goal 
was to maximize the range of load voltages over which the sources would behave as ideal 
elements. Finally, it was important to minimize the output rise time when the sources were 
switched on in order to increase operating speeds. 

Variations in the W/L ratios of the transistors are the only source of mismatch which can 
be directly influenced by the layout. The width and length of all transistors were thus made 
as large as possible so that minor process variations in line widths would have a smaller 
percentage effect on the actual dimensions. Increasing L also helps improve the ideality of 
the current sources as it decreases channel length modulation and therefore increases the 
output resistance. 

A simple p-type current mirror driven by a biased NMOS transistor was chosen for 
the design because it requires minimal area and allows a maximum load voltage of Vdd ~ 
\Vgs — Vt\. The bias transistor was sized at W/L = 12A/24A to give 3.37/uA of current 
for 1.2V input bias. This value was chosen so that if all 576 current sources of the same 
type in the 24x24 array were switched on, the maximum output current which would need 
to be handled would be approximately 2mA. Since each source dissipates 33/uW of power 
when on, the maximum power dissipation in the full array with all current sources on is 
38mW. The sources are turned on when the gate voltages on the two transistors connected 
in series between the bias transistor and the current mirror are brought high. The gate of 
the transistor closest to the p-fet is connected to the control input, i.e., b or s ® b, while 
the gate of the other switch transistor is connected to a signal, labeled here as CLK, which 
periodically goes high. 

The size of the current mirror transistors was determined both by the rise time require- 
ments and by the need to maximize both the load voltage range and the output resistance. 
Small values of \V gs — Vj| are achieved by making W/L small while large output resistance 
requires large values of L. Fast rise times, however, are achieved by reducing the gate ca- 
pacitance of the mirror transistors which needs to be charged when the current source is 
switched on. An appropriate compromise between these conflicting needs was obtained by 
choosing W/L = 6A/20A. As can be seen from the simulation results plotted in Figures 16-4 
and 16-5, the current sources as designed behave ideally up to load voltages of 4.2V, while 
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the output rises to within 1% of its final value in 160ns. 

The dashed lines in Figure 16-4 indicate the change in output current for a change of 
±5% in the W/L ratio of either the bias transistor or in one of the mirror transistors. The 
resultant variation in output current is also approximately ±5% from 3.55/uA to 3.2/iA. By 
using large values for both W and L in all of these transistors, however, it is hoped that 
the standard deviation of W/L variations will be much less 5%. Furthermore, to the extent 
that mismatches in the current sources are random and have zero mean value, summing the 
outputs from different sources will tend to cancel individual variations. The net effect of 
mismatch on the precision of the matching circuit should thus be minimal. 

16.2 Global Test Circuits 

To complete the matching procedure, the scores generated at each offset must be com- 
pared with both the threshold ai||_B|| and the current minimum score. Since the minimum 
score must be stored as a voltage, it is best to convert the currents to voltages at this point, 
as indicated by the circled resistive elements in Figure 16-1, at the same time scaling the 
total current, Lb, from the base edge pixels by a factor of a\ with respect to total score 
current Ly. Representing ai||_B|| by the voltage ol\Vb a l so simplifies the block validation 
test (14.15), as the values of Vh and V\ can then be supplied as external voltages which can 
be adjusted as needed. 

Analog circuits are thus required outside the unit cells for current scaling and conversion, 
as well as for comparing and storing voltages. Since the comparator outputs are necessarily 
binary signals and since the search window offset values must clearly be represented digitally, 
the remaining functions needed in the procedure of storing candidate offsets and computing 
the maximum spread must be performed with digital circuits. 

16.2.1 Current scaling and conversion 

The circuits used for converting the currents Lb and Ly to the voltages ol\Vb an d Vy 
are shown in Figures 16-6a and 16-6b. The only difference in these circuits is the size of the 
diode-connected input transistor, which is twice as wide for Lb as for Ly. Since numerous 
simulations of the matching procedure on test image sequences have shown that best results 
are obtained by setting a\ to its maximum value of 0.5, it was chosen to hardwire this value 
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into the circuit. The current into the opposing transistor of the ra-type current mirror in 
Figure 16-6a is thus equal to Ig/2, to within the accuracy allowed by the fabrication. 

In order for the output voltages to be within the range required by the comparator 
circuits, discussed in the following section, the currents Ig/2 and Iy are then pulled through 
a second p-type current mirror whose output branch feeds into a diode-connected ra-fet 
serving as the 'resistor'. The fact that this resistor is nonlinear is of no importance to the 
design since comparing the values of Ig/2 and Iy only requires that it be monotonic. It is 
very important, however, for the two voltage conversion circuits to be well matched. Large 
geometry transistors were thus used to reduce the percentage mismatch due to line width 
variations in fabrication, and long channels, L > 10A, were used on each of the transistors to 
reduce channel length modulation. Because of the large current-carrying capacity required 
to accomodate the maximum possible current of ~ 2mA, however, it was necessary to use 
large W/L ratios, which reduces the output resistance r and thus increases the output 
nonlinearity. 

The simulated current divider characteristic for the Ig circuit is shown in Figure 16- 
7. The curve is linear with slope of exactly 1/2 for input currents up to 1.2mA. Beyond 
this point, however, the slope decreases considerably as the transistor driving the p-current 
mirror is pushed into the triode region. Since Ig must be less than I max /2, however, in 
order to pass the block validation test, we are only concerned with the behavior of the 
divider up to inputs of 1mA. 

The I-V characteristic of the complete Ig circuit for inputs of to 1mA is shown in 
Figure 16-8, while a similar curve for the Iy circuit is given in Figure 16-9, but with a 
different input axis. For the Iy circuit, the output voltage is plotted as a function of the 



CHAPTER 16. MIXING ANALOG AND DIGITAL 



268 



lb Current Divider Characteristic 




Figure 16-7: Ib current divider characteristic. 





Ib-Vb Characteristic 


3.5 






3 


^^^^ - 


2.5 


^~^~^ 




^^ slope = 2.16 Kohms 


> 2 


/^ 


3 

o 




.Q 

>1.5 




0.5 


- 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Ib in (mA) 



Figure 16-8: Simulated I—V characteristic for computing Ib —* ol\Vb- 



CHAPTER 16. MIXING ANALOG AND DIGITAL 



269 



Volts vs. Number of Sources On 




60 80 

Number of votes 



Figure 16-9: Output score voltage vs. number of processors with non-matching edge pix- 
els. 
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Figure 16-10: Rise and fall times of Vy for a 0.3mA input after switching on current 
sources and before turning on reset transistor. 
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number of processors responding by dividing the total current by 3.37/uA, which is the unit 
current from one source. For a score to pass the threshold test, it must be the case that 
Iy < c^i-Tb < -P/4, and hence the maximum number of responding nodes for the score to 
be a candidate is 576/4 = 144. 

As can be seen from the diagrams, the output voltages for both the Ib and Iy circuits 
are between IV and 3.2V when the inputs are within the acceptable range. The slope of 
the curve for the Iy circuit in its least steep portion is llmV/vote. Since, according to the 
precision requirements of Section 14.3, it is necessary to discriminate between scores that 
are different by more than 3 or 4 votes, the resolution of the comparator circuit should thus 
be better than 40mV. 

The transient behavior of the Iy circuit for a 0.3mA ramped input, corresponding 
roughly to the rise time characteristic of the node sources, is shown in Figure 16-10. The 
output voltage is able to follow the rising input without an appreciable delay such that 
after 170ns, the output is stable at the final voltage. When the node sources are switched 
off, however, the fall time is much slower since, as the gate voltage on the diode-connected 
transistor decreases, there is less current to discharge the gate, and when the output voltage 
reaches the transistor threshold, the drain current becomes negligible. It is thus necessary 
to connect an additional reset transistor to bring the output back to zero at each cycle. 
In Figures 16-6a and 16-6b, this reset transistor is shown with its gate connected to the 
periodic waveform <j)\. Timing of the different clock signals used in the matching circuit is 
discussed in Section 16.3. 

16.2.2 Validation and V m i n tests 

The remaining analog circuit needed is the voltage comparator for performing the vali- 
dation tests and for finding the minimum score. For simplicity a single design, shown inside 
the dashed box of Figure 16-11, was used for all of the tests. The two input voltages, in- 
dicated as I rii and Iri2, are connected to the gates of two identical p-type source followers 
and 1.13pF capacitors when the signal CLK — which is the same as the one which switches 
on the node current sources — goes high. The reset transistors are turned on once at the 
very beginning of the matching procedure to charge the capacitors to an initial high value. 
This operation is necessary to initialize the comparator in the V m i n circuit, but is irrelevant 
for the other validation tests. 
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Source Follower Characteristic 




Figure 16-13: Input/output characteristic for source followers used in comparator circuits. 



The comparator operation is driven by the three clock signals i?i, i?2, and R% whose 
timing relative to the CLK signal is shown in Figure 16-16. Initially, these signals are all 
high while the output voltages ol\Vb an d Vy rise to their final value. When CLK is brought 
low, the capacitors and source follower inputs are isolated from the outputs of the Ib and 
Iy conversion circuits. i?i, i?2, and R% are brought low at the same time as CLK so that 
the output of the source followers can charge the gates of the ra-type half latch and the two 
inverters. 

The simulated Vi n —V ou t characteristic of the source followers with a bias voltage, Vbh, of 
3.2V is shown in Figure 16-13. The source followers serve two purposes in the comparator 
circuit. The first is to shift the input voltages upward so that the minimum input value is at 
least IV above the threshold of the latch transistors. The second is to buffer the input values 
with a stable source that does not require intermediate switches. When R\ is brought high, 
the two latch transistors are both turned on. The one with the higher gate voltage, however, 
will have the higher current, and thus its drain voltage will drop more quickly than that of 
the other transistor. When one of the drain voltages drops below \Vi n — Vj|, where Vi n is 
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the input voltage on the source follower connected to that side, the lower p-fet of the source 
follower will turn off, and the bias transistor will supply the current needed to drive the 
drain all the way to ground. The opposing latch transistor, whose gate is connected to the 
drain which goes to OV, will be turned off, and the output of the source follower connected 
to its drain will return to its value prior to bringing R\ high. The capacitors holding the 
input voltages to the source followers are sized large enough so that any coupling through 
the gate-source capacitance, C gs , of the lower p-fets will be negligible. 

Once the gate voltages of the latch transistors are stable, the signals Ri and R%, which 
connect the outputs of the two inverters to the cross-coupled latch transistors, are brought 
high. In order to avoid metastable states, these two signals are staggered so that the 
comparator output cannot 'hang' if the two input voltages are identical. Ri is brought high 
before R% so that if the drain voltage on the lefthand side is slightly above the inversion 
threshold, the righthand side will be brought to ground, while if it is below the inversion 
threshold, the opposite side will go to Vdd- Once R% is brought high, the comparison 
operation is complete, and the outputs Q and Q are binary signals such that Q = Vdd if 
Irii > I rii and Q = Vdd if I n i > I n 2 ■ 

The design of the comparator was based primarily on the needs of the V m i n circuit and 
is more than adequate for the validation tests comparing (x{Vb to Vy, and to the upper and 
lower limits Vh and V\. The V m i n circuit is special, however, in that only one input voltage 
is supplied. The current minimum score is stored on one of the I.I3pF capacitors and is 
compared with each input value. If the new value is lower, it must then be stored as the 
minimum score, and the circuit must signal that the minimum has changed so that the new 
offset position can be latched. 

The circuit configurations for performing these operations are shown in Figures 16- 1 1 
and 16-12. The input is gated to one side or the other of the comparator based on the results 
of the last test by connecting the outputs Q and Q to the gates of two pass transistors. If 
Q is high, the input on the right side was greater than that on the left in the last compare. 
The next score voltage, Vy, will thus be gated to the In^ input and will be compared with 
the value still held on the capacitor on the Iri\ side. If the new value is less than the old 
stored value, Q will go high, causing the next input to be gated to the Iri\ side, while the 
new minimum value is stored at In^- 

The toggle circuit shown in Figure 16-12 is used to indicate a change in V mra . The 
output Q is connected to an inverter via a pass transistor controlled by the signal CLK. 
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Figure 16-14: Test pattern for simulating the 5x5 array. 

When CLK is low, which is the case at the end of a compare operation, the output of the 
inverter is the value of Q from the previous compare. Connecting the outputs Q and Q to 
the XOR circuit as shown thus makes toggle = Q (x) Q i& while CLK is low. If the minimum 
value changes, the output signal toggle will remain high until CLK again goes high. 

16.3 Test Pattern Simulation 



To test the operation of the complete matching circuit in finding the best offset for an 
actual edge pattern, a full simulation was conducted with a 5x5 array and a 9x9 search 
window. The test patterns used in the simulation for the base edge block and the search 
window are shown in Figure 16-14. At the correct offset, indicated by the dashed lines, the 
5x5 edge pattern in the search window matches that of the test block exactly except for 
one pixel in the lower lefthand corner which is different. To find the best offset, however, 
the matching circuit must test each of the 25 different offset positions in which the 5x5 
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Figure 16-15: Layout of 5x5 array. 
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block is entirely contained in the search window and correctly find the minimum score at 
the indicated location. 

A layout for the full circuit is shown in Figure 16-15 and includes the 5x5 matching 
array, the Ib and Iy scaling and voltage conversion circuits, the Vy-aiVs comparator, and 
the V m i n circuit, as well as an additional four rows containing a 4x5 s- shift array so that 
an entire column of the search window can be loaded in one input cycle. Fhe metall lines 
running horizontally at the base and top of the array carry the necessary control signals 
and bias voltages for operating the matching circuit. Fhe layout shown measures 2007A(h) 
X 1779A(w) of which 608A in height are occupied by the optional 4x5 s- shift array and 
172A in height are taken by the conversion and validation circuits. 

Fhe waveforms for the principal periodic control signals are shown in Figure 16-16. Fhe 
simulation was based on a 40ns minimum pulse width with 400ns required to process each 
offset. Fhe clock signals <j)\ and fa control vertical movement on the shift register, while 
clocks fa and (f>4 control horizontal movement. Fhe shift sequence is such that the block is 
first aligned with a particular column in the search window, after which each row offset is 
processed sequentially. Once the last row offset has been processed, the entire search window 
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Figure 16-17: Vy and (x\Vb outputs sampled at end of each cycle. 



is shifted horizontally by one column with a new column being read in as the farthest one 
on the block is shifted out. For the present configuration, which has five row offsets per 
column, there are thus four vertical shifts for every one horizontal shift. Fhe 400ns process 
time for each offset is due to the 160ns needed to complete each shift — since the clock phases 
cannot overlap, 200ns to allow adequate settling of the node current sources, and 40ns to 
reset the clock signals i?i, i?2, and R^ used in the comparator circuits. 

Fhe results of the complete simulation are plotted in Figures 16-17 through 16-19. It is 
clear from the first diagram, which shows the peak values of (x\Vb and Vy sampled at the 
end of each processing cycle, that shift position ^13 corresponds to the minimum score. 
Fhe minimum value of Vy is not only significantly less than all of the other scores, but 
it is also the only which is less than (x\Vb- Fhe Vy-aiVs comparator output, plotted in 
Figure 16-18, confirms this fact as offset ^13 is the only one to produce a high output. Fhe 
result of the V m i n circuit, given by the toggle output plotted in Figure 16-19, shows that 
the minimum score fluctuates a few times at first, but then rises for the last time when the 
correct offset is reached. 
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Figure 16-18: Response of threshold test circuit at each offset of test pattern. 
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16.4 Comparison with Digital Architectures 

The simulation results from the test pattern indicate that the mixed analog and digital 
matching circuit functions correctly according to its design. It requires 400ns to process 
each offset and can thus search a 50x50 window in Ims. A 24x24 array processor, including 
the comparator and V m i n circuits, would occupy 4432A(h) X 8160A(w), based on the layouts 
shown for the unit cell and 5x5 array. In a 0.8/um (A = 0.4/im) process, one matching circuit 
would require 1. 77mm X 3.26mm, and thus eight individual circuits could comfortably fit 
on a single I cm die. 

In the last chapter, it was estimated that the corresponding 24x24 digital processor 
required 9888A(h) X 6240A(w) for the array alone, and at least another 400A in width to 
accomodate the summation circuit which adds the results from each row. Digital versions of 
the comparator and V m i n circuits were not designed or laid out for this processor, however, 
the area taken by these components should also be considered. In a 0.8/um process, the 
array and summation circuits together require at least 3.95mm X 2.66mm, and thus, at 
best, four processors might fit on a 1cm die. The time required per offset was not estimated 
for the digital processor. However, it is not at all clear that it could be operated faster than 
the mixed analog/digital design. Since the same time (200ns) is required by both circuits 
for shifting the search window, the difference in their speeds is determined by the time 
required to compute each score and compare it to the minimum value. For the fully digital 
processor to be faster, the tally circuit and V m i n comparator would have to be designed to 
complete these operations in less than the 200ns required by the analog circuit. 

The second digital architecture studied is much faster than the first one, as its total 
processing time is determined by the size of the block being matched and not that of the 
search window. Furthermore the computations for each edge pixel only require updating 
the 8-bit accumulators at each array node, and thus the delays are much shorter. The major 
disadvantages of this design are its large size and the fact that the dimensions of the array 
limit the maximum search area. It was estimated at the end of the last chapter that one 
could at best fit a single 30x30 array on a 1cm die. 

In summary, the mixed analog/digital matching processor does appear to best meet the 
needs of the motion estimation system as they have been formulated. This processor has 
an 8-to-l area advantage over the digital circuits used for motion estimation in commercial 
systems (i.e., the second architecture) and does not restrict the search window size. It has, 
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at least, a 2-to-l area advantage over its direct digital equivalent and can be operated at 
comparable, or better, speeds. The maximum power dissipation of each 24x24 processor 
during the 200ns computation cycle is estimated at < 40mW in the worst possible case 
when all internal current sources are turned on. 
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Chapter 17 



Recommendations for Putting It All Together 



It is appropriate to conclude by combining the results of the previous chapters into a 
plan for the complete system design. One possible configuration of a single 'board' motion 
estimation system, including the multi-scale veto processor and the analog/digital edge 
matching circuits developed in this thesis, is shown in Figure 17-1. From the analysis of 
Chapter 8, we know that the spatial resolution of the image sensor must be sufficiently fine 
so that, given the block size used in the matching procedure, each block can be reasonably 
approximated as a single point. From tests with real image sequences, it has been seen 
that the minimum sensor size needed to obtain accurate motion estimates is approximately 
256x256 pixels. 

Following the recommendations of Chapter 13, it should be possible, with a 0.8/um 
or better CCD/CMOS process, to build a 256x256 MSV focal-plane processor which will 
perform both imaging and edge detection. The advantage of the focal-plane processor is 
that the signal degradation incurred in loading the image one pixel at a time through the 
fill- and- spill input structure and transferring the charges over the length of the gate array is 
avoided. The improved design does not, however, offer any savings in total processing time, 
as the time not spent in loading the image is amply made up for in sequentially performing 
the differencing and threshold tests for each row and column. Assuming the processor is 
operated at 5MHz, and including a normal image acquisition time of 1ms, it should take 
approximately 5ms for the edge detector to process each frame and deliver the binary edge 
outputs to the memory buffers. 

As discussed in Section 11.5.2, most of the power required to operate a CCD array is 
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Figure 17-1: Single 'board' motion estimation system. 
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dissipated in the clock drivers. To minimize power requirements, a separate chip containing 
specially designed drivers tuned for the capacitive and inductive loads of the system, should 
be included to supply the clock waveforms for the MSV procesor. Tuned drivers can reduce 
power dissipation by a factor of 1/Qf, where Qf is the quality factor of the oscillator. 

To obtain accurate motion estimates, it is essential to try to match a large number of 
blocks in order to ensure that a sufficient number of correspondence points will be found. 
In the numerous tests performed with real image sequences, it was often observed that the 
hit rate, i.e., the fraction of blocks having acceptable matches, was very low in ordinary 
scenes due to a combination of the lack of distinctive features and the frequent occurrence 
of repeating patterns which give multiple candidate matches. It was typical to obtain only 
30-50 correspondences out of more than 400 blocks tested. 

It would not be practical to include 400 matching circuits on the motion system board 
both for reasons of size and of power consumption — even if 8 individual circuits are contained 
in each chip. The proposed configuration of Figure 17-1 holds four chips, such that a total of 
32 blocks can be matched simultaneously. By pipelining the matching operations, 448 blocks 
can be tested in 14 cycles. At the end of Chapter 16, it was calculated that a 50x50 window 
could be searched in 1ms. Taking this as the average window size, the entire matching 
process can thus be completed in 14ms plus some additional time to manage operational 
overhead. Assuming that imaging and edge detection require 5ms, and estimating that 
roughly 10ms are needed to both solve the motion equations and perform other housekeeping 
tasks, the complete system should be able to process each image pair in just over 29ms and 
thus achieve a throughput of slightly better than 30 frames/sec. 

The final important component of the system is the micro-controller which sequences the 
operations on the board and also performs the motion computations. This component was 
hardly covered in this thesis as there are many commercially available digital processors 
which could be used for this purpose. It is important, however, to choose one which is 
adequate for the task but does not include unnecessary functions which will increase power 
consumption, and possibly the cost of the processor itself. 

There are, of course, many issues left to be resolved to complete the motion system. The 
proposed new design of the MSV edge detector should be fabricated in a better-controlled 
and smaller scale CCD process, and then tested to evaluate its performance. Full-size 
matching circuits should also be fabricated to verify their actual performance with that 
predicted by simulation. The tuned clock driver chip for controlling the MSV processor 
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must be designed, and an adequate, but minimal, microprocessor should be chosen to 
control the system. 

Constructing the full system is outside the scope of any one thesis. The major con- 
tribution of this research has been to thoroughly analyze the theoretical and algorithmic 
constraints imposed on the system and, given these constraints, to carefully study the de- 
sign of its two most important components. The results developed in this thesis have thus 
established the foundation which will serve as the basis of further work for building the 
complete motion estimation system according to the goals set for its design. 
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Appendix A 



Quaternion Algebra 



Quaternions are vectors in H 4 which may be thought of as the composition of a scalar 
and 'vector' part [4]. 

a = (ao,a) (A.l) 

where a = (a^, a y , a^) T is a vector in H 3 . 
Conjugation is defined by 

a* = (a ,-a) (A.2) 

and the multiplication of two quaternions is by 

ab = (ao&o — a • b, aob + &o a + a X b) (A. 3) 

Quaternion multiplication is associative, but not commutative. The identity with respect 
to multiplication is 

e = (l,0) (A.4) 

In all other respects, quaternions may be treated as ordinary vectors. The transpose, 
dot product, and multiplication by a matrix are defined in the usual manner. One way to 
express the same operation in (A. 3) is by a matrix-vector multiplication using equivalent 
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quaternion matrices. For example, 



ab 



and 



ab 



/ a 

a x 
a y 
V a z 

( b 

by 



-a x 
a 
a z 

-a y 

-b x 

bo 

-b z 



-a z 
a 
a z 



b z 

bo 

-K 



,Jj y 
-a x 
a / 

-h \ 
-by 

b x 

bo J 



Ab 



(A.5) 



Bk 



(A.6) 



A is referred to as the left quaternion matrix associated with a, and B_ is referred to as 
the right quaternion matrix associated with b. Quaternion matrices are useful for re- 
arranging formulas into more convenient expressions. Using either (A. 3) or the matrix 
formulations (A.5) and (A.6), the following identities can be easily shown 



aa 



(ab)* 



(aq) • (bq) 
(aq) • b 



(a • aje 
b a* 

(a-b)(q-q) 
a • (bq*) 



(A.7) 

(A.8) 

(A.9) 

(A.10) 



A vector in IR 3 can be represented by a quaternion with zero scalar part. If r, £, and b 
are quaternions with zero scalar part then 



r ■ I 

it 
b • (ri ) 



— r 

(-r-£,rx£) 

b-(rxf) 



(A.ll) 
(A.12) 
(A.13) 
(A.14) 



The last identity above is the quaternion representation of the triple product. 

The reason that quaternions are so useful is because of the simplicity with which rotation 
about an arbitrary axis can be represented. A unit quaternion is one whose magnitude, 



APPENDIX A . Q UATERNION ALGEBRA 



299 



defined as the square root of its dot product with itseif, is unity. 



q-q= 1 



(A.15) 



Every unit quaternion represents a rotation in IR 3 of an angle 6 about an axis Q> in the 

sense that 

/ 0\ 

(A.16) 



q = cos — , u> sin — 
4 [ 2 ' 2 



The rotational transformation of a vector £ is found from 



£' = q£ q* 



(A.17) 



where £' is the quaternion representation of the rotated vector, £'. Using quaternion matri- 
ces, we can also write (A.17) as 



QQ*£ 
( 1 T 
\ R 



(A.18) 



where R is an orthonormal rotation matrix. Expanding terms we find 



R 



( ill + il ~ <£, ~ il) 2(qxq y -qoqz) 2(q x qz + qoq y ) \ 

2(q x q y +qoqz) {ql - q'i + ql - q'i) 2(q y qz-qoq x ) 



(A.19) 



\ c Kq x qz-qoq y ) 2(q y qz + qoq x ) (ql - ql - q' 2 y + q'i) ) 

If we are given an orthonormal rotation matrix R, the corresponding unit quaternion can 

be found by noting that 

, 9 9 Tr(R) + 1 , k s 

4 = cos 2 - = l l^ (A.20) 

and by solving the eigenvector equation 



Rc3 = Q> 



(A.21; 



Additional results on quaternions and their properties can be found in [88] and [4]. 



Appendix B 



Special Integrals 



In the analysis of the numerical stability and error sensitivity of the motion estimates, 
it is necessary to compute the following integrals: 



r D /•27T 



/ [ V \£f£^d^da (B.l) 

Jo Jo 

I'D i-2tt 

/ / \£'\U^da (B.2) 

Jo Jo 

i-D r2ir 

/ / £'£' T £d£da (B.3) 

Jo Jo 

/ / \£'\ 2 £'£' T £d£da (B.4) 

Jo Jo 

I'D /•27T 

/ / (u ■£')£'£ ' T £d£da (B.5) 

JO Jo 

/ / (u 1 -£ , )(u 2 -£ , )£ , £ ,T ZdZda (B.6) 

Jo Jo 

where u, ui, and U2 are arbitrary vectors and 

£' = K£ (B.7) 

with £ being the vector from the center of projection to a point (£ cos a, £ sin a) on the 
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image plane in the left camera system 



1 4 cos a 
4 sin a 

V 1 J 



:b.s) 



as shown in Figure 8-1. 

We assume that the vectors £ are dense over the cone defined by the field of view so 
that we can replace summations by integrals and thereby obtain an analytical expression. 
In this Appendix, I will derive the solutions to the integrals (B.1)-(B.6) which are used in 
Chapter 8. 

B.l jj\£'\ 2 £'£d£da 

Since \£'\ 2 = \£\ 2 we have from (B.8) 



|2 _ e T 



\£\' =£'£ = 1+^ 



;b. 9 ) 



The rotation matrix R is constant and can therefore be taken outside the integral. We can 
thus write 



rD /•2tt 



D /•27T 



/ / \£'\ 2 £'Zdtda = R/ / (1 + e 
Jo Jo Jo Jo 



( c \ 

1 4 cos a 

4 sin a 

V 1 ) 



^d^da 



Rh7r^ D (e + e 3 )^e 



:b.ioi 



The rotation is now taken into account by writing R as 



R = ( v x v 2 h 



fB.m 



where the vectors vi, #2, and v^ form an orthonormal triad. It is useful to note that v% = Hz 
represents the rotation of the optical axis in the left camera system. We thus have 



l-D f2lT ( 

/ / \£'\ 2 £'^d^da = TrD 2 1 



D 2 



V3 



:b.i2) 
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B.2 jj\£'\ 4 £d£da 

Direct multiplication gives 



2 i f4 



i£'i 4 = |£| 4 = i + 2^ 2 + e 



:b.i3) 



and the integral is straightforward: 

rD rlix 



I I** \£'\Hd^da = 2vr/ (£ + 2£ 3 + ?)<% 
Jo Jo Jo 



irD 2 | 1 + D 2 + — 



:b.u) 



B.3 



JJ£'£' T £d£da 



We start by writing £'£' as 



£'£ ,T = B1£ T K T 



with 



£V 



' £ 2 cos 2 a £ 2 cos a sin a £ cos a ' 

£ 2 cos a sin a £ 2 sin 2 a £ sin a 
\ £ cos a £ sin a II 



;b.i5) 



:b.i6) 



Moving the terms R and R outside the integral and integrating over a we obtain 



2vr 



££ T da 



The integral over £ then gives 



D /•27T 



o JO 



££ T £rf£rfa 



^ 7r£ 2 ^ 
vr£ 2 
2vr 



/ ttD 4 /4: ^ 

tt.D 4 / 4 



;B.17) 



V 







ttD 2 



) 



D 2 



ttD' (I- zz 1 ) +ZZ 1 



:b.181 



:b.i9) 
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Combining (B.19) and (B.ll) we obtain finally 



j-D [-2TT ( l-D [-2TT \ 

/ / £'£' T £d£da = R / / ££ T £d£da R 1 
Jo Jo \Jo Jo J 



irD 2 I — (I - w 3 -()3 T j + v 3 v 3 T 



:b.2oi 



B.4 jj\£'( 2 £'£' T £d£da 



The only difference between this integral and the previous one is the presence of the 
term \£'\ 2 = £ £ = 1 + £ 2 . Since this term does not depend on a we can proceed as before 
to find 

( Trie + e 4 ) o o ^ 



/■2tt 

/ \£'\ 2 ££ T da 
Jo 



o Trie + e) o 

2vr(l + £ 2 ) J 



Integrating over £ we then have 



fB.2r 



<-D /•27T 



/ / \£'\ 2 ££ T £d£da 
Jo Jo 










- 4 "T~ 6 . 








ttD 2 (1 + ^ 



T) J 



D 2 D 4 



D 2 D 4 



ttD z \ I— + — I + 1 + zz> 



:b.22) 



Including the rotation, we obtain 



l-D f2lT ( l-D f2lT \ 

/ / \£'\ 2 £'£' T £d£da = R, / / \£\ 2 ££ T £d£da R 1 

Jo jo yjo jo J 



ttD 1 



'd 2 d 4 ' 



D 2 D 4 ' 



w 3 w 3 



fB.23) 
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B.5 



oi\/>i/>iT 



u ■£')£' I' 1 £d£da 



This integral is somewhat more tedious than the previous ones due to the (u • £') term 
which does depend on a. We start by defining 



<u>\ 



\< ) 



R T u 



and note that from the definition (B.lll 



'B.24) 



/ = u • v\ , v! = u • i>2 , and v! = u • v 3 



:b.25) 



Since u • £' = u' ■ £ we can write 



u • £' = u' x £ cos a + v! £ sin a + u' z 



fB.26) 



This scalar term multiplies each element of the array H££ R producing many products 
of trigonometric terms. Fortunately, only a few of these are non-zero after integrating over 
a from to 2ir. We have 



/ * (u • t)£p : da = ir 
Jo 



( <e o u' x e x 



,1 f2 „,/ f2 



o <e u' y e 



\ u' x e u' y e 2u' z ) 

and performing the integration over £, we obtain 



/ 



<-D r 2ir 



[ [ ^ (u-£')£'£' T ^d^da = TrD 2 K 
Jo Jo 



u' z D 2 /4 u' x D 2 /4 







\ 



\ u' x D 2 /4 u y D 2 /4 



u' z D 2 /4 u' y D 2 /4 
i r>2, 



R J 



'B.27) 



ttD 2 k(u' — I + u' I 1 



3D 2 



D 2 



JzT , -„/T 



zz x + — [u'z 1 +ZU"-) I R J 



2 , ,D 2 , J 3D 2 

irD 2 u • v 3 )—l + (u • v 3 ) 1 - — - 



D 2 



v 3 v 3 



uv 3 + V 3 U 



:B.28) 
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There is a special case which is worth noting. If u = £3, equation (B.28) becomes 

D r 



1 - — I v 3 v 3 T 



j-D i-2-k ( n2 

/ / (v 3 ■£')£'£ ' T £d£da = ttD 2 [ 1 

Jo Jo \ 4 

= / / £'£' T ^d^da (B.29) 

Jo Jo 

Multiplying the integrand by (£3 • £') thus has no effect on the result. 



i\oioiT 



B.6 / / (ui ■ £')(u 2 ■ £')£'£' f d£ da 



In this final integral even more complex trigonometric products are encountered. Ex- 
panding the term (ui • -£')(u2 • £') we have 

(u t ■ £')(u 2 ■ £') = u'^££ T u' 2 

,/ / f 2 „2 „ 1 „,/ „,/ f2 „■ 2„ 1 „,/ „,/ 



u \x u 2x£ COS a + U ly U 2y£ sin a + U lz U 2 

2y + U 2x U ly 
I 

2y 



+ ( U lx U 2y + U 2x U ly) ? COS « sin « + ( U lx U 2z + U 2x U lz) Z COS « 

+ ( U ly U 2z + U 2y U lz)£ S ' m a (B.30) 



,/ _ tjT„ _j „/ _ -dT 



where u 1 = R ui and u 2 = R u 2 . The integral is written as 

/ / T ( Ul ■£ , )(u 2 -£ , )£ , £ ,T Zd^da = K\ f f * (u'i ££ T u' 2 )££ T ^d^da ) R T (B.3I) 
Jo Jo \Jo Jo J 

so that each element of the matrix ££ is multiplied by the scalar u\££ u' 2 . The only 
trigonometric products which survive the integration over a, however, are the following: 

2vr 3^ 

cos 4 a da = — (B.32) 

4 

2vr 3^ 

sin ada = — (B.33) 

4 

2tt ^ 

cos a sin a da = — (B.34) 

4 

2vr 

cos a da = ir (B.35) 



2vr 

sin a da = ir (B.36) 
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2vr 



da = 2ir 



'B.37) 



We now define the following integrals: 

<-D r 2ix 



L x = / f ^ ££ T £ 2 cos 2 a^d^da 
Jo Jo 



irD 4 

4 


' D 2 /2 



^ 
£76 


V ° 


l) 


i-D r 


2vr 





L 2 = [ [ * ££ T £ 2 sin 2 a^d^da 
Jo Jo 



irD 4 



4 

V 

rD rlix 



1 D 2 /6 ^ 
D 2 /2 
1 



L 3 = / / ££ T £d£da 

Jo Jo 



irD' 



V 

rD rlix 



( D 2 /4 ^ 
£> 2 /4 
1 



f f T 2 

L4 = / / ££ £ cos a sin a S^d^da 

Jo Jo 



irD 6 



/ 1 ^ 



24 

V 

rD /•2tt 



1 





rD rZir 

L 5 = / / ££ T £ cos a £ d£ da 
Jo Jo 



irD 4 



/ 1 ^ 



4 

V 

rD rlix 





1 



rJJ rZir 

L 6 = / / ££ T £ sin a £rf£ da 
Jo Jo 



fB.38) 



fB.39) 



^B.40) 



:b.4d 



:B.421 
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irD 4 



/ ^ 
1 
1 



^B.43) 



so that 



f D /•27T 



/ / T ( Ul ■£ , )(u 2 -£')£'£' T Zd£da = R 
Jo Jo 



( n /T T n' n /T T n' n /T T n' \ 
U 1 LiiU 2 U 1 L14U 2 U 1 Li5U2 



,/Ti 



,/T 



u' x L 4 u' 2 u / 1 L 2 u / 2 u^L^u' 



/T n 



/T n 



R T (B.44) 



yu' 1 L 5 u' 2 u' x L 6 u' 2 u^I^u^y 



Using the definitions (B.38)-(B.43), we expand the elements of this matrix, grouping terms 
by powers of D, to arrive at 



rD tl-K 



[ [ T ( Ul ■£ , )(u 2 -£ , )£ , £ ,T ZdZda = 
Jo Jo 

I U lx U 2x + U ly U 2y/^ ( U 'lx U 2y + U 2x U 'ly) /% °\ 
( U 'lx U 2y + U 2x U 'ly)/3 U 'lx U 2x/3 + U ly U 2y 





xD e 



R 



V 



R J 



/ 



irD 4 



-R 



Hz u 2z 





h \z a 2z 



\ U lx U 2z + U 2x U lz U ly U 2z + U 2y % 



■lz 



U lx U 2z + U 2x U lz 
U ly U 2z + U 2y U lz 

u'i • u' 2 



R J 



irD 2 1 



D 2 



(Ui • W 3 )(U 2 • V 3 ) hh^ 



:B.45) 



We can write the first of these matrices as 



irD e 



-R 



V 

ttD 6 

24 



' Ulx^x + U 'ly U 2y/3 ( U lx U 2y + U 2x U ly)/3 * 

(u' lx u' 2y + u 2x u' ly )/3 u' lx u' 2x /3 + u' ly u' 2y 




R J 



R (i - zz T ) [u'm'J + u' 2 u'? + ((u'i • u' 2 ) - u' lz u' 2z )l] (i - ^ T ) R' 



irD e 

~2A 



I - v 3 v 3 T j [uiuj + u 2 u^ + ((ui • u 2 ) - (ui • v 3 )(ui • 63)) ij (i - v 3 v 3 T J (B.46) 
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To simplify notation, we define the vectors wi and w 2 as 



wi = (v 3 X ui) X v 3 = I - v 3 v 3 ui, and, w 2 = (v 3 X u 2 ) X v 3 = I - v 3 v 3 u 2 (B.47) 



Since v 3 ■ v 3 = 1 , we also note that 



wi • w 2 = ui • u 2 - (ui • w 3 )(u 2 • v 3 ) 



'BA8) 



Equation (B.46) can thus be reduced to 



xD e 



-R 



' U lx U 2x + U ly U 2y/3 ( U lx U 2y + U 2x U ly) /% 0^ 

(u' lx u' 2y + u 2x u' ly )/3 u' lx u' 2x /3 + u' ly u' 2y 



irD 6 



R J 
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wiwj + w 2 wj + (wi • w 2 ) (I - v 3 v 3 T )\ (B.49) 



The second matrix of equation (B.45) can be written as 
/ 



irD 4 



-R 



u-,„u. 



lz u 2z 









Hz u 2z 



U lx U 2z + U 2x U lz 



I „,' „,' „,' 



\ U lx U 2z + U 2x U lz U ly U 2z + U 2y 



'lz 



U ly U 2z + U 2y U lz 

u'i • u' 2 



R J 



/ 



irD 



\Z -\- Z U i 



J ^T 



zu 



— R [m' 2 ^ (u'i^ + zu'{ j + u lz \u! 2 z 

— 4u' lz u' 2z zz + u' lz u' 2z ( I — zz J + (u'i • u'^i'i' R 



7T.D 4 



U 2 • W 3 I U1W3 



W3U1 



v u x -v 3 ) [u 2 v 3 T + h~u 2 



4(ui • W 3 )(u 2 • W 3 )^3^3 T + (Ui • W 3 )(U 2 • V 3 ) [I - V 3 h T ) 



ui • u 2 )-u 3 -u 3 



:b.5oi 



From (B.47) we can derive the following expression for wiwj + w 2 wj as 



T T 

W]W 2 + w 2 w x 



uiuj + uiuj - (u 2 • v 3 ) (ui-u 3 T + v 3 uf 



(ui • 63) (u 2 w 3 T + 6 3 ujj + 2(ui • w 3 )(u 2 • w 3 )w 3 w 3 T (B.51) 
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Then, using (B.48) and (B.51), we can rewrite equation (B.50) as: 
/ 



irD 4 



-R 



Ui,U 



lz^z 














U lx U 2z + U 2x U lz 
U ly U 2z + U 2y U lz 



R J 



V U lx U 2z + U 2x U lz U ly U 2z + U 2y U lz »r«2 / 

— — [uiuj + u 2 u^ - w x wj - w 2 w^ - 2(ui • w 3 )(u 2 • v^hh^ 
+ (ui • u 2 - wi • w 2 ) (I - v 3 v 3 T ) + (ui • u 2 )-u 3 -u 3 T 



irD 4 



[uiuj + u 2 u^ + (ui • u 2 ) I - 2(u x • w 3 )(u 2 • v^v^s 1 

WiW^ + W 2 W] r + (Wi • W 2 ) (I - V 3 V3 T ) J I 



(B.52) 

Finally, we combine equations (B.49) and (B.52) with the third term of equation (B.45) to 
obtain the result: 

/ / ( ui • £') ( u 2 • £')£'£ £ d£ da = 
Jo Jo 

I — 1 J [wiwj + w 2 wj + (wi • w 2 ) (i - w 3 w 3 T jJ 



irD 4 D 2 



irD 4 



4 
irD 2 fl 



[uiuj + u 2 u^ + (ui • u 2 ) ij 

(ui • w 3 )(u 2 • v 3 ) v 3 v 3 T 



3D r 



:b.53) 



We note that a special case occurs if either ui or u 2 is equal to v 3 . The solution then 
becomes 

r D r 2ix 



[ [ V (v 3 ■ £')(u ■ £')£'£' T £ d£ da 
Jo Jo 



irD 2 



D 2 



~ T T 

uv 3 + V 3 U 



u ■ v 3 )— I + II — I (u • v 3 ) V 3 V 3 



rD r 2ir 



= / / (u ■£')£'£ ' T £d£da 
Jo Jo 

Again, multiplying the integrand by (v 3 ■ £') has no effect on the result. 



:B.54) 



